Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plans to integrate GTE model natively into transformers #35568

Closed
2 tasks done
yaswanth19 opened this issue Jan 8, 2025 · 8 comments
Closed
2 tasks done

Any plans to integrate GTE model natively into transformers #35568

yaswanth19 opened this issue Jan 8, 2025 · 8 comments

Comments

@yaswanth19
Copy link

yaswanth19 commented Jan 8, 2025

Model description

Any plans to integrate gte model natively into transformers as right now we are using this model with trust_remote_code=True argument

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

Model Implementation: https://huggingface.co/Alibaba-NLP/new-impl/blob/main/modeling.py
Model Weights: https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5

@yaswanth19
Copy link
Author

@ArthurZucker @Rocketknight1 If we do intend to integrate this model then I can work on creating a draft PR.

@mahimairaja
Copy link

Is there a place, I can help you to add the model @yaswanth19 ?

@yaswanth19
Copy link
Author

@ArthurZucker A gentle ping.
@tomaarsen Can this also be integrated into sentence-transformers

@Rocketknight1
Copy link
Member

This seems popular enough to justify an integration, yes. WDYT @tomaarsen?

@tomaarsen
Copy link
Member

tomaarsen commented Jan 17, 2025

@Rocketknight1
I suspect there are 3 popular and promising models built on this architecture:

Beyond that, the authors are now using another implementation on top of Qwen:

Some of the mechanisms are similar to ModernBERT (I see unpadding), but some differ as well (xformers). It might require a good bit of effort to get everything to line up with transformers, and I think there's a chance that there will be no more big models based on this architecture.

  • Tom Aarsen

@yaswanth19
Copy link
Author

@Rocketknight1 Should I start implementing support for this model, or do you think the effort outweighs the potential benefit and keep using these models with trust_remote_code?

@Rocketknight1
Copy link
Member

Hi @yaswanth19, given @tomaarsen's comment above, I think it's okay to leave them as trust_remote_code models, especially if future versions of gte already exist on a different architecture.

@tomaarsen
Copy link
Member

On this topic, the Alibaba team actually just released superior modes based on the new ModernBERT architecture today:

I imagine that they might not move forward with their previous architecture, especially considering they mention that the only parameter they changed for these compared to their previous models was the base model.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants