GPU usage during training #81

segalinc · 2024-01-26T17:16:52Z

segalinc
Jan 26, 2024

Trying to replicate SAM training from scratch.
I notice that during training the GPU usage goes up and down 0-100 even when not logging info. I am not computing embedding on the fly as I am loading the already precomputed.
I wonder if it's related to the grad accumulation step or something else?
I also see training is slower than sd2.1 in comparison even if I can use a bigger batch
Any insights?

lawrence-cj · 2024-01-27T16:20:58Z

lawrence-cj
Jan 27, 2024
Maintainer

If you want faster training speed, u can try to set the grad_checkpointing = False and fp32_attention = False in your config file. However, it's possible that the former one will cause higher GPU memory, while the later one may cause NaN loss. The low usage of your GPU may be caused by the large batchsize which maybe higher than the bandwidth of the machine.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PixArt

GPU usage during training #81

{{title}}

Replies: 1 comment

{{title}}

Select a reply

PixArt

GPU usage during training #81

segalinc Jan 26, 2024

Replies: 1 comment

lawrence-cj Jan 27, 2024 Maintainer

segalinc
Jan 26, 2024

lawrence-cj
Jan 27, 2024
Maintainer