Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issues about the reproduction #12

Open
forg77 opened this issue Nov 14, 2024 · 6 comments
Open

Some issues about the reproduction #12

forg77 opened this issue Nov 14, 2024 · 6 comments

Comments

@forg77
Copy link

forg77 commented Nov 14, 2024

Hello!

I am very interested in your work, and I encountered some issues during the reproduction process.

  • How can I replace the original text encoder with the tuned Llama 3 model? I checked the config file LLM2CLIP-EVA02-L-14-336/configuration_evaclip.py, and I noticed that the model parameters for the text encoder remain the same as those in the original CLIP model. This is a bit confusing to me.

  • If I’m correct, is the run.sh script provided for training CLIP with a frozen Llama 3 encoder?

Looking forward for your reply!

@Yif-Yang
Copy link
Collaborator

We will response to you after CVPR ddl, thanks to your attention~

@Divyanshupy
Copy link

I had the same question. I was wondering if access to the LLM text encoder would be possible. Great work !

@Yif-Yang
Copy link
Collaborator

@Divyanshupy @forg77 We have updated the caption contrastive fine-tuned version of Llama3-8B-CC (https://huggingface.co/microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned) to assist with your retrieval experiments and training of your own CLIP models. Additionally, the parameters for our adapter and projector have been made available in our OpenAI ViT-L repository (https://huggingface.co/microsoft/LLM2CLIP-Openai-L-14-336). The retrieval testing methods are documented in the model card for reference.

Our tests show retrieval performance exceeding the results reported in the paper, and we encourage you to try it out.

Regarding the EVA series of models, there have been precision mismatches during the conversion to Hugging Face, which are currently being fixed. Updates will be released progressively.

Furthermore, we will provide detailed instructions on how to use LLM2CLIP to fine-tune your own CLIP models in about a week—please stay tuned!

@chaewon-huh
Copy link

Thank you for the updates and for making the fine-tuned Llama3-8B-CC model available! I’m really looking forward to trying it out and exploring the improvements in retrieval performance.

I was wondering, do you have any plans to release a fine-tuned version of a smaller text encoder, such as Llama 1B? It would be incredibly helpful for experimentation in environments with limited computational resources.

Thanks again for your great work and ongoing support!

@Yif-Yang
Copy link
Collaborator

Thank you for the updates and for making the fine-tuned Llama3-8B-CC model available! I’m really looking forward to trying it out and exploring the improvements in retrieval performance.

I was wondering, do you have any plans to release a fine-tuned version of a smaller text encoder, such as Llama 1B? It would be incredibly helpful for experimentation in environments with limited computational resources.

Thanks again for your great work and ongoing support!

Thanks for your support.
I think we will try release all our text model we tried including llama3.2 1B in within this week.

@Yif-Yang
Copy link
Collaborator

@chaewon-huh We already released llama3.2 1B model in https://huggingface.co/microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned, please check. Thank you for your interest of our work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants