-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About training replication #8
Comments
In addition, installing However, the version of transformers installed based on the commit hash below should be
This version works for inference, but during training, the following error occurs.
I found that the main cause of this issue is that the |
Installing However, I found that 1) increasing batch size adds additional great time cost; 2) increasing batch size will not introduce too much GPU memory cost. I'd like to know whether the authors had observed similar cases, then decided to use multiple 24G GPUs of different nodes. Thanks! |
Additionally, I want to know whether the released checkpoint on HuggingFace corresponds to Q-Size: 458
......(omitted)
VideoLISA/evaluation/reason_vos/metrics.py:32: RuntimeWarning: invalid value encountered in divide
j = inters / union
......(omitted)
J: 0.3763731515787202
F: 0.4261801187118281
J&F: 0.4012766351452741 It is weird because there is almost no potential error in downloading the provided dataset/checkpoint and conducting the two-step evaluation. Would the authors be able to help point out possible reasons? Thanks. |
Please forgive me for having one more question. I noticed that the training script I would like to know whether the training for the released checkpoint used VisualQA and VideoQA datasets. Additionally, Tab.10 of the paper refers to which VideoQA dataset? Why the VideoQA dataset is removed in the final version? |
Hi @qirui-chen , thanks for you interests in VideoLISA.
Feel free to let me know if you have further questions. |
Thank you for your reply, which has been very helpful to me!!! In fact, I still have a few more questions and would appreciate your assistance in your available time. They are prioritized as follows:
Thank you very much for your help and response! |
Hi @qirui-chen , We have updated the evaluation suite for image benchmarks, including ReasonSeg and refCOCO series: https://github.com/showlab/VideoLISA?tab=readme-ov-file#image-benchmarks Regarding post-optimization, it is non-trivial to integrate XMem2 into another codebase. The best practice is to import the inference result into its codebase to have a post-optimization. Here is the guideline: https://github.com/showlab/VideoLISA?tab=readme-ov-file#post-optimization About the problem of reproducing ReasonVOS number. We have carefully investigated the issue. The current checkpoint at huggingface is already the final version. The performance mismatch originates from the discrepancy between the cleaned code and the old data structure. We have updated the data and evaluation code. You should be able to reproduce the number reported in the paper, except for small numerical difference due to package version difference (torch, transformers, etc.) Best, |
Thank you very much for your response and for providing the code quickly !!! Maybe the one last question: does the current training script
|
Hi @qirui-chen , I just updated the training script that was used to produce the final result in the paper (Tab 1, 2, and 3). Best, |
Thank you for your quick reply, but the updated parameters
seem to no longer correspond to the part after line #L225 in |
That's a nice catch. We used to treat Davis as an independent dataset, because it was added at later stage of the project. While during code cleaning before open-source, we re-organized it under ref-vos dataset. |
Thank you for the reply. Your responses are very helpful to me. |
Sorry to bother the author again, but I would like to ask why the phrase "Sure, [SEG]" is added to the input prompt during inference. Shouldn't this be part of the model's output? For example, here. What’s more strange is that I found adding or omitting this phrase doesn't seem to affect the model's output. The model still ends up outputting "[SEG]. <|end|>". I want to know why this happens. Thank you. |
This is a teacher-forcing technique adapted from the work of LISA.
To ensure smooth evaluation, we adapt the teaching forcing technique same as LISA. |
Thank you for your response; it resolved my issue!!! I would like to ask if you have encountered situations during inference where the model does not output I wonder how to use |
Both |
Thank you for your response. |
Hello authors, thanks for your great work. I encountered an issue while setting up the environment. After installing
torch==2.1.0+cu121
, I am unable to import torch. It seems that a similar issue was mentioned in link. Could you please double-check the correct version of torch and the installation method? Thank you.The text was updated successfully, but these errors were encountered: