Some issues about the reproduction #12

forg77 · 2024-11-14T08:43:49Z

Hello!

I am very interested in your work, and I encountered some issues during the reproduction process.

How can I replace the original text encoder with the tuned Llama 3 model? I checked the config file LLM2CLIP-EVA02-L-14-336/configuration_evaclip.py, and I noticed that the model parameters for the text encoder remain the same as those in the original CLIP model. This is a bit confusing to me.
If I’m correct, is the run.sh script provided for training CLIP with a frozen Llama 3 encoder?

Looking forward for your reply!

Yif-Yang · 2024-11-14T08:53:45Z

We will response to you after CVPR ddl, thanks to your attention~

Divyanshupy · 2024-11-17T22:23:50Z

I had the same question. I was wondering if access to the LLM text encoder would be possible. Great work !

Yif-Yang · 2024-11-18T07:26:41Z

@Divyanshupy @forg77 We have updated the caption contrastive fine-tuned version of Llama3-8B-CC (https://huggingface.co/microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned) to assist with your retrieval experiments and training of your own CLIP models. Additionally, the parameters for our adapter and projector have been made available in our OpenAI ViT-L repository (https://huggingface.co/microsoft/LLM2CLIP-Openai-L-14-336). The retrieval testing methods are documented in the model card for reference.

Our tests show retrieval performance exceeding the results reported in the paper, and we encourage you to try it out.

Regarding the EVA series of models, there have been precision mismatches during the conversion to Hugging Face, which are currently being fixed. Updates will be released progressively.

Furthermore, we will provide detailed instructions on how to use LLM2CLIP to fine-tune your own CLIP models in about a week—please stay tuned!

chaewon-huh · 2024-11-19T02:37:32Z

Thank you for the updates and for making the fine-tuned Llama3-8B-CC model available! I’m really looking forward to trying it out and exploring the improvements in retrieval performance.

I was wondering, do you have any plans to release a fine-tuned version of a smaller text encoder, such as Llama 1B? It would be incredibly helpful for experimentation in environments with limited computational resources.

Thanks again for your great work and ongoing support!

Yif-Yang · 2024-11-19T04:22:49Z

Thank you for the updates and for making the fine-tuned Llama3-8B-CC model available! I’m really looking forward to trying it out and exploring the improvements in retrieval performance.

I was wondering, do you have any plans to release a fine-tuned version of a smaller text encoder, such as Llama 1B? It would be incredibly helpful for experimentation in environments with limited computational resources.

Thanks again for your great work and ongoing support!

Thanks for your support.
I think we will try release all our text model we tried including llama3.2 1B in within this week.

Yif-Yang · 2024-12-13T15:30:06Z

@chaewon-huh We already released llama3.2 1B model in https://huggingface.co/microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned, please check. Thank you for your interest of our work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some issues about the reproduction #12

Some issues about the reproduction #12

forg77 commented Nov 14, 2024

Yif-Yang commented Nov 14, 2024

Divyanshupy commented Nov 17, 2024

Yif-Yang commented Nov 18, 2024

chaewon-huh commented Nov 19, 2024

Yif-Yang commented Nov 19, 2024

Yif-Yang commented Dec 13, 2024

Some issues about the reproduction #12

Some issues about the reproduction #12

Comments

forg77 commented Nov 14, 2024

Yif-Yang commented Nov 14, 2024

Divyanshupy commented Nov 17, 2024

Yif-Yang commented Nov 18, 2024

chaewon-huh commented Nov 19, 2024

Yif-Yang commented Nov 19, 2024

Yif-Yang commented Dec 13, 2024