-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: added annotations for training data #1742
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Re: NQ - I think it is probably a subset of the dev split (filtered out: queries without an answer, or having a table as an answer, or with conflicting Wikipedia pages) |
Updated a few cases. I am unsure if "StackExchangeClusteringP2P" is included in the training data if you train on: If so most sentence transformers models would be non-zero-shot. A solution to this would be to remove StackExchange from the MMTEB ("beta") benchmarks. This kinda comes down to how we define zero-shot. So far I have done:
|
Just for you information, we will also need to rewrite the annotations I added to task names as I thought you would have to add the dataset paths to the models' metadata. |
Added training data annotations for a variety of models.
This was quite hard to do I must say so something might be wrong. It would be great to get a review especially on the sentence embeddings training data that will solve a lot of downstream cases.
We should probably tag relevant model authors here as well.
@Muennighoff unsure if the NQ test split on mteb corresponds to train/dev split on natural questions? Can you also take a look at the stackexchange cases.
If the split wasn't annotated I assumed that they trained on the full dataset (including test).
Adressed #1720
Checklist
make test
.make lint
.