You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello author, Thanks for your excellent work!
I have some questions about code reproduction to ask you. I retrained for 16 epochs on a single 48GB GPU without using distributed training. The reproduced results were: acc:0.4411, comp:0.4156, overall:0.4284, and these metrics did not meet expectations. Is this due to training with a single GPU? Or could there be other reasons?
I would greatly appreciate your answers to my questions!
The text was updated successfully, but these errors were encountered:
Thank you very much for your answer. I will analyze from the perspective of loss curve.Is it possible that interrupting training and then resuming using the 'resume' parameter could lead to unstable convergence?
And Do you think that training with a single 48GB GPU will affect the training effect?
Hello author, Thanks for your excellent work!
I have some questions about code reproduction to ask you. I retrained for 16 epochs on a single 48GB GPU without using distributed training. The reproduced results were: acc:0.4411, comp:0.4156, overall:0.4284, and these metrics did not meet expectations. Is this due to training with a single GPU? Or could there be other reasons?
I would greatly appreciate your answers to my questions!
The text was updated successfully, but these errors were encountered: