-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some issues about the reproduction #12
Comments
We will response to you after CVPR ddl, thanks to your attention~ |
I had the same question. I was wondering if access to the LLM text encoder would be possible. Great work ! |
@Divyanshupy @forg77 We have updated the caption contrastive fine-tuned version of Llama3-8B-CC (https://huggingface.co/microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned) to assist with your retrieval experiments and training of your own CLIP models. Additionally, the parameters for our adapter and projector have been made available in our OpenAI ViT-L repository (https://huggingface.co/microsoft/LLM2CLIP-Openai-L-14-336). The retrieval testing methods are documented in the model card for reference. Our tests show retrieval performance exceeding the results reported in the paper, and we encourage you to try it out. Regarding the EVA series of models, there have been precision mismatches during the conversion to Hugging Face, which are currently being fixed. Updates will be released progressively. Furthermore, we will provide detailed instructions on how to use LLM2CLIP to fine-tune your own CLIP models in about a week—please stay tuned! |
Thank you for the updates and for making the fine-tuned Llama3-8B-CC model available! I’m really looking forward to trying it out and exploring the improvements in retrieval performance. I was wondering, do you have any plans to release a fine-tuned version of a smaller text encoder, such as Llama 1B? It would be incredibly helpful for experimentation in environments with limited computational resources. Thanks again for your great work and ongoing support! |
Thanks for your support. |
@chaewon-huh We already released llama3.2 1B model in https://huggingface.co/microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned, please check. Thank you for your interest of our work. |
Hello!
I am very interested in your work, and I encountered some issues during the reproduction process.
How can I replace the original text encoder with the tuned Llama 3 model? I checked the config file LLM2CLIP-EVA02-L-14-336/configuration_evaclip.py, and I noticed that the model parameters for the text encoder remain the same as those in the original CLIP model. This is a bit confusing to me.
If I’m correct, is the run.sh script provided for training CLIP with a frozen Llama 3 encoder?
Looking forward for your reply!
The text was updated successfully, but these errors were encountered: