what is meant by SAM dataset? #88

blacklig · 2024-02-13T13:29:04Z

blacklig
Feb 13, 2024

Hi, I am quite new in pixart and I would like to train first fine tune and then controlnet.

Where I struggle is how to properly set dataset.. Tutorial from @kopyl is OK, but only for google colab (thus messing with different paths on their container and I was not really able to sucessfully without big hussle convert it to local machine, as they have models in /cache etc ... just different approach than OP is using). Nevertheless, @kopyl at least uses those tools to convert HF dataset, then prepare it by extracting features..

I totally lack this setep in OPs approach and tutorial.. there is just mention, take SAM dataset and that is it.. but what is it? Do I need to also extract features? Can anyone please give me more info about it and also point out exactly what SAM dataset is?

Also what confuses me is that part under training guideline that says "You ONLY need to change the config file in config and dataloader in dataset."

but huh, that is not ONLY :) files are big and I dont really know what exactly should I change.. And for config it even points to 404 file (https://github.com/PixArt-alpha/PixArt-alpha/blob/master/configs/pixart_config_stage2) so that is just mess...

Hope some nice sould can help me out to understand how to properly run training :)

And one more question - is it possible, that distributed training via pytorch or accelerate actually cannot utilise all its power when ran on pcie connected 3090 or 4090 cards? Is it that it needs higher cards like A100 or H100 to utilise them fully? As I tried some different (SD) trainings on two cards, performance was always worse than on only 1 GPU.. I suggest it is because of pcie being huge bottle neck? Maybe nvlink could help, but how do you nvlink 8 cards for example? Thanks

JohnnyRacer · 2024-03-09T03:22:49Z

JohnnyRacer
Mar 9, 2024

@blacklig It's likely the main bottleneck actually comes dataset loading to/from the GPU and the storage device where the training data is located, probably not an issue with PCIE bandwidth (NVME SSD is around 10Gb/s max, PCIE x16 is around 60?GB/s) . If you read the original Stable Diffusion paper, there are references to how a certain # of GPUs were actually being used as a high speed reserve cache with their VRAM being used to feed the training data to the other GPUs.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PixArt

what is meant by SAM dataset? #88

{{title}}

Replies: 1 comment

{{title}}

Select a reply

PixArt

what is meant by SAM dataset? #88

blacklig Feb 13, 2024

Replies: 1 comment

JohnnyRacer Mar 9, 2024

blacklig
Feb 13, 2024

JohnnyRacer
Mar 9, 2024