Add giga embeddings #1741

Samoed · 2025-01-09T12:38:22Z

Added InstructSentenceTransformerWrapper to use SentenceTransforme models with instructions.

Ref embeddings-benchmark/results#77
@ekolodin My results are a bit higher. Could you rerun your results using this implementation, or provide your implementation? My code for run

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

Task	Leaderboard	PR
AmazonCounterfactualClassification	90.31	94.1493
EmotionClassification	73.1	92.075
ToxicConversationsClassification	75.37	90.1123
SprintDuplicateQuestions	86.3	93.487
TwitterSemEval2015	63.42	65.8234
SciDocsRR	88.01	84.5092
AskUbuntuDupQuestions	58.19	61.41
SCIDOCS	19.16	20.056
SciFact	72.9	67.707
STS16	81.09	79.6737
STSBenchmark	82.2	78.9945
SummEval	27.86	30.9884

KennethEnevoldsen

I would appreciate the metadata add but otherwise it looks good. Of course lets wait until we have a look at the differences in score.

KennethEnevoldsen · 2025-01-10T13:06:56Z

mteb/models/ru_sentence_models.py

+    use_instructions=True,
+)


Can we add the training data annotation as well (we are going through models and adding that)

see_ #1561

They haven't publish report yet, so I don't know anything about training dataset

KennethEnevoldsen · 2025-01-10T13:08:36Z

mteb/models/instruct_wrapper.py

+        # to passage prompts won't be applied to passages
+        if (
+            not self.apply_instruction_to_passages
+            and prompt_type == PromptType.passage
+            and task.metadata.type == "s2p"
+        ):
+            instruction = None


Similar to jasper and nv-embed this model doesn't use prompt for passages. I think that can be helpful to add this to base class

Samoed added 6 commits January 7, 2025 14:31

add gigaembeddings

1fcd872

use jasper

47bb697

fix name

80a90c6

create sentence_transformer instruct wrapper

27df66a

apply instruction template

ddf96f4

fix jasper

b70232e

KennethEnevoldsen approved these changes Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add giga embeddings #1741

Add giga embeddings #1741

Samoed commented Jan 9, 2025 •

edited

Loading

KennethEnevoldsen left a comment

KennethEnevoldsen Jan 10, 2025

Samoed Jan 10, 2025

KennethEnevoldsen Jan 10, 2025

Samoed Jan 10, 2025

Add giga embeddings #1741

Are you sure you want to change the base?

Add giga embeddings #1741

Conversation

Samoed commented Jan 9, 2025 • edited Loading

Checklist

Adding a model checklist

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

KennethEnevoldsen Jan 10, 2025

Choose a reason for hiding this comment

Samoed Jan 10, 2025

Choose a reason for hiding this comment

KennethEnevoldsen Jan 10, 2025

Choose a reason for hiding this comment

Samoed Jan 10, 2025

Choose a reason for hiding this comment

Samoed commented Jan 9, 2025 •

edited

Loading