diff --git a/README.md b/README.md index c4937e53..9a001806 100644 --- a/README.md +++ b/README.md @@ -100,6 +100,12 @@ Models can be easily trained in a single node multi-gpu setting ```bash dice --accelerator "gpu" --strategy "ddp" --dataset_dir "KGs/UMLS" --model Keci --eval_model "train_val_test" ``` +Similarly, models can be easily trained in a multi-node multi-gpu setting +```bash +torchrun --nnodes 2 --nproc_per_node=gpu --node_rank 0 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula -m dicee.run --trainer torchDDP --dataset_dir KGs/UMLS +torchrun --nnodes 2 --nproc_per_node=gpu --node_rank 1 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula -m dicee.run --trainer torchDDP --dataset_dir KGs/UMLS +``` + Train a KGE model by providing the path of a single file and store all parameters under newly created directory called `KeciFamilyRun`. ```bash diff --git a/docs/index.rst b/docs/index.rst index da19dcaa..bd3121e8 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -19,16 +19,16 @@ Welcome to DICE Embeddings! .. code-block:: bash // 1 CPU - (dicee) $ python -m dicee.run --path_dataset_folder KGs/UMLS + (dicee) $ dice --dataset_dir KGs/UMLS // 10 CPU - (dicee) $ python -m dicee.run --path_dataset_folder KGs/UMLS --num_core 10 + (dicee) $ dice --dataset_dir KGs/UMLS --num_core 10 // Distributed Data Parallel (DDP) with all GPUs - (dicee) $ python -m dicee.run --trainer PL --accelerator gpu --strategy ddp --path_dataset_folder KGs/UMLS + (dicee) $ dice --trainer PL --accelerator gpu --strategy ddp --dataset_dir KGs/UMLS // Model Parallel with all GPUs and low precision - (dicee) $ python -m dicee.run --trainer PL --accelerator gpu --strategy deepspeed_stage_3 --path_dataset_folder KGs/UMLS --precision 16 + (dicee) $ dice --trainer PL --accelerator gpu --strategy deepspeed_stage_3 --dataset_dir KGs/UMLS --precision 16 // DDP with all GPUs on two nodes (felis and nebula): - (dicee) cdemir@felis $ torchrun --nnodes 2 --nproc_per_node=gpu --node_rank 0 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula -m dicee.main --trainer torchDDP --path_dataset_folder KGs/UMLS - (dicee) cdemir@nebula $ torchrun --nnodes 2 --nproc_per_node=gpu --node_rank 1 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula -m dicee.main --trainer torchDDP --path_dataset_folder KGs/UMLS + (dicee) cdemir@felis $ torchrun --nnodes 2 --nproc_per_node=gpu --node_rank 0 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula -m dicee.run --trainer torchDDP --dataset_dir KGs/UMLS + (dicee) cdemir@nebula $ torchrun --nnodes 2 --nproc_per_node=gpu --node_rank 1 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula -m dicee.run --trainer torchDDP --dataset_dir KGs/UMLS .. toctree:: :maxdepth: 2