-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding AxoNN's 3D tensor parallelism [WIP] #1086
base: main
Are you sure you want to change the base?
Conversation
Under testing |
@Quentin-Anthony I have updated the install instructions to install axonn from a fixed commit - 3ebc34c |
Thanks! |
@Quentin-Anthony Pushed some communication optimizations and also updated the instructions to install axonn from a newer commit - 45647ea. To enable these optimizations, you just need to set |
Steps to run -
Install AxoNN (dependencies - Pytorch and mpi4py) -
Preparing a config file to use AxoNN -
First, set
"use_axonn_model_parallelism"
: trueThen set
"depth_model_parallel_size"
,"row_model_parallel_size"
,"column_model_parallel_size"
as per the requirements of your model. The product of these should equal"model_parallel_size"
.you can also set
"optimize_axonn_communication: true"
to enable communication optimizations. These also require you to set the environment variable - export CUDA_DEVICE_MAX_CONNECTIONS=1At a high level, the matrix multiplications in your model will be sharded over
"model_parallel_size"
GPUs.ToDos -