Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impact of acc_grad Setting on Model Performance #6

Open
IceWYB opened this issue Dec 11, 2024 · 5 comments
Open

Impact of acc_grad Setting on Model Performance #6

IceWYB opened this issue Dec 11, 2024 · 5 comments

Comments

@IceWYB
Copy link

IceWYB commented Dec 11, 2024

Hi, dear authors, thanks for the excellence work!
I am attempting to reproduce the experimental results reported in your paper and notice some issues. The grad_accumulation_steps argument is specified as 1 in the paper, but is set to 20 in the run_train.sh script. This discrepancy seems to affect the total amount of data used during training. I tried the setting in the paper (grad_accumulation_steps=1, global_batch_size=128) but fail to reproduce the performance metrics. So I wonder whether the grad_accumulation_steps setting is one critical factor causing inconsistencies in the results.
Thank you very much for your attention. I look forward to your reply!

@JosephPai
Copy link
Collaborator

Hi @IceWYB, thanks for your interests.
As stated in the instruction here, https://github.com/showlab/VideoLISA?tab=readme-ov-file#training, we use 8 node (64 A10 24G GPUs), each GPU has batch-size of 2, and grad_accumulation_steps=1, thus the global batch size is 64x2=128.
The final performance can be affected by batch size, grad_accumulation_steps, learning rate, and even the datasets sample ratio.
May I know how many GPUs do you use during training?
Regarding "fail to reproduce the performance metrics", how large is the discrepancy?

@IceWYB
Copy link
Author

IceWYB commented Dec 12, 2024

Thanks for your quick reply!
I first tried 8 GPUs with batch size=8 and grad_accumulation_steps=1 to align with the configuration mentioned in the paper. However, after 10 epochs, the results were significantly different, with the mevis_JF metric only reaching 30. I then experimented with many different batch_size and grad_acc_steps settings but still cannot replicate th performance. I noticed that, unlike typical setups, the dataset uses a random sampling strategy. I am curious if this randomness willintroduce significant variability in the results. Did you also encountered similar issues during your training process?
Thank you for your assistance!

@JosephPai
Copy link
Collaborator

HI @IceWYB , you are encouraged to monitor the performance with the image version ReasonSeg during training, which yields more stable results.
I will also try to reproduce the result with smaller scale training this weekend.

@IceWYB
Copy link
Author

IceWYB commented Dec 14, 2024

Yeah, I agree with you that the reproduced results on mevis is not stable. At first I tried to use a global batch size of 128 (8 GPUs, 8 batch_size, 2 grad_acc_steps) and conducted repeated experiments. The first run resulted in a mevis_JF metric of 29.9 while the second run achieved 50.0. And I also tried to fix images/videos used in the dataset but still get varible results, partly because the annotations are still random sampled.
Looking forward to your method to result with smaller scale training soon!

@Lexarymade
Copy link

I'm wondering whether the mevis here refers to the mevis_valid_u.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants