-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PytorchEngine multi-node support v2 #3147
base: main
Are you sure you want to change the base?
Conversation
Fantastic job! |
cc @jinminxi104 |
Please prepare a guide about the multi-node deployment, including offline pipeline and online serving |
|
Automatic model downloading has not been supported on the ray backend. The user should put the model in the same location on each node. |
@lvhan028 tested Ok with |
May merge the latest main so that I can request a full test |
Users can choice the
distributed_executor_backend
when using a single node.Note
It is encouraged to build the cluster in docker container with
--network host
like vLLM to ensure each node has the same environment.and automatic model downloading is not supported for now. Models should be pre-downloaded on each node.