Replies: 1 comment
-
@blacklig It's likely the main bottleneck actually comes dataset loading to/from the GPU and the storage device where the training data is located, probably not an issue with PCIE bandwidth (NVME SSD is around 10Gb/s max, PCIE x16 is around 60?GB/s) . If you read the original Stable Diffusion paper, there are references to how a certain # of GPUs were actually being used as a high speed reserve cache with their VRAM being used to feed the training data to the other GPUs. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I am quite new in pixart and I would like to train first fine tune and then controlnet.
Where I struggle is how to properly set dataset.. Tutorial from @kopyl is OK, but only for google colab (thus messing with different paths on their container and I was not really able to sucessfully without big hussle convert it to local machine, as they have models in /cache etc ... just different approach than OP is using). Nevertheless, @kopyl at least uses those tools to convert HF dataset, then prepare it by extracting features..
I totally lack this setep in OPs approach and tutorial.. there is just mention, take SAM dataset and that is it.. but what is it? Do I need to also extract features? Can anyone please give me more info about it and also point out exactly what SAM dataset is?
Also what confuses me is that part under training guideline that says "You ONLY need to change the config file in config and dataloader in dataset."
Hope some nice sould can help me out to understand how to properly run training :)
And one more question - is it possible, that distributed training via pytorch or accelerate actually cannot utilise all its power when ran on pcie connected 3090 or 4090 cards? Is it that it needs higher cards like A100 or H100 to utilise them fully? As I tried some different (SD) trainings on two cards, performance was always worse than on only 1 GPU.. I suggest it is because of pcie being huge bottle neck? Maybe nvlink could help, but how do you nvlink 8 cards for example? Thanks
Beta Was this translation helpful? Give feedback.
All reactions