-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impact of acc_grad Setting on Model Performance #6
Comments
Hi @IceWYB, thanks for your interests. |
Thanks for your quick reply! |
HI @IceWYB , you are encouraged to monitor the performance with the image version ReasonSeg during training, which yields more stable results. |
Yeah, I agree with you that the reproduced results on mevis is not stable. At first I tried to use a global batch size of 128 (8 GPUs, 8 batch_size, 2 grad_acc_steps) and conducted repeated experiments. The first run resulted in a mevis_JF metric of 29.9 while the second run achieved 50.0. And I also tried to fix images/videos used in the dataset but still get varible results, partly because the annotations are still random sampled. |
I'm wondering whether the |
Hi, dear authors, thanks for the excellence work!
I am attempting to reproduce the experimental results reported in your paper and notice some issues. The grad_accumulation_steps argument is specified as 1 in the paper, but is set to 20 in the run_train.sh script. This discrepancy seems to affect the total amount of data used during training. I tried the setting in the paper (grad_accumulation_steps=1, global_batch_size=128) but fail to reproduce the performance metrics. So I wonder whether the grad_accumulation_steps setting is one critical factor causing inconsistencies in the results.
Thank you very much for your attention. I look forward to your reply!
The text was updated successfully, but these errors were encountered: