-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support embedders
setting
#554
base: main
Are you sure you want to change the base?
Support embedders
setting
#554
Conversation
embedders
setting
1076241
to
8ffb555
Compare
Hello @CommanderStorm Can you rebase your branch? We made changes recently to improve the library Sorry for the inconvenience, I try to review your PR as soon as possible after the rebase |
51820a9
to
8ade09d
Compare
Done ^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @CommanderStorm,
Code-wise, the PR is very nice and well-documented; I love it!
But I’m hitting way too many timeouts to really help you; sorry 😖
I don’t think we’ll be able to merge this PR with tests that take about 10 minutes. Could you mock Meilisearch as we did here:
Lines 902 to 924 in cb32534
async fn test_get_tasks_with_params() -> Result<(), Error> { | |
let mut s = mockito::Server::new_async().await; | |
let mock_server_url = s.url(); | |
let client = Client::new(mock_server_url, Some("masterKey")).unwrap(); | |
let path = | |
"/tasks?indexUids=movies,test&statuses=equeued&types=documentDeletion&uids=1&limit=0&from=1"; | |
let mock_res = s.mock("GET", path).with_status(200).create_async().await; | |
let mut query = TasksSearchQuery::new(&client); | |
query | |
.with_index_uids(["movies", "test"]) | |
.with_statuses(["equeued"]) | |
.with_types(["documentDeletion"]) | |
.with_from(1) | |
.with_limit(0) | |
.with_uids([&1]); | |
let _ = client.get_tasks_with(&query).await; | |
mock_res.assert_async().await; | |
Ok(()) |
That way, we simply ensure that meilisearch-rust sends a valid payload and hope meilisearch works as expected.
userProvided seens more brittle, but we may want change to this instead
Or we could do that and actually send the payload to meilisearch.
Or do both; let me know what you prefer, but I would very much like the tests to not take that much time to run 😖
Where in the Meilisearch codebase could I find an e2e-test how to use feature?
I don’t think there is. I believe you’re right, and we only wrote tests for user-provided vectors
introduces the experimental-vector-search-feature flag
I don’t know if this is required, @curquiza. Could we simply expose the feature as-is instead of hiding it behind a feature flag (that will make it harder to test and use).
Since it doesn’t add any dependency, I don’t see much point in putting it behind a feature flag
Yes, that's ok not to do feature flag 😊 |
0285e62
to
9033232
Compare
Time after being migrated to
I think keeping the testcases from requiring a active internet connection is better as otherwise the test might be unnessesarily flaky in CI.
I can add a testcase where I mock the routes. I have tweaked a bunch with the vectors available and can't get the It seems to always return everything when I set I have tried similar 2D vectors as the upstream test and went as far as to use 1D vectors with considerable (1/10/1k/1m) spread (=>something that as far as I understand embeddings should not match) => I am missing something. 😅 Note I can obviously steal the testcase from |
9033232
to
0e044b1
Compare
@CommanderStorm really sorry for this. |
Thanks for the pinging (github does not give me notifications for this) ^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CommanderStorm thank you again! Can you add the rest
and ollama
models we added in v1.8.0?
Sorry for the late notice again!
And thank you for your involvement 🙏
Hey @CommanderStorm, since we introduced a new |
0d1f07f
to
cbac495
Compare
The changes requested are implemented. @curquiza could you please approve a workflow run In retrospective, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @CommanderStorm,
The PR seems really good; I think we're almost ready to merge.
I think you missed one issue in your test; your definition of what a vector is was wrong, and thus, serde was ignoring most vectors when deserializing the documents.
I left a comment on the subject.
To write the test you wanted to write initially, where the vectors appear as None,
you should enable the experimental feature first.
This brings me to another point; I think every test using the feature should start by enabling it just in case they run first.
AND we should absolutely remove these two tests:
meilisearch-rust/src/features.rs
Lines 108 to 132 in c673c8d
mod tests { | |
use super::*; | |
use meilisearch_test_macro::meilisearch_test; | |
#[meilisearch_test] | |
async fn test_experimental_features_get(client: Client) { | |
let mut features = ExperimentalFeatures::new(&client); | |
features.set_vector_store(false); | |
let _ = features.update().await.unwrap(); | |
let res = features.get().await.unwrap(); | |
assert!(!res.vector_store); | |
} | |
#[meilisearch_test] | |
async fn test_experimental_features_enable_vector_store(client: Client) { | |
let mut features = ExperimentalFeatures::new(&client); | |
features.set_vector_store(true); | |
let res = features.update().await.unwrap(); | |
assert!(res.vector_store); | |
} | |
} |
That will mess with everything
I think I have adressed the requested changes ^^
I have only removed the test which disables the vector store, but have kept the test which Next time, I will split such PRs earlier ^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This updates the PR to the breaking 1.11
changes.
Docs for this feature: https://meilisearch.notion.site/Binary-quantization-usage-v1-11-2a9c9559461a4a9d9fa3e0ea5149ad68 |
@irevoire sorry to ping you. Is merging this feature in the experimental stage still something you are interested in, or would this feature need to be stabilised first? |
src/search.rs
Outdated
/// | ||
/// **Default: `false`** | ||
#[serde(skip_serializing_if = "Option::is_none")] | ||
retrieve_vectors: Option<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// | |
/// **Default: `false`** | |
#[serde(skip_serializing_if = "Option::is_none")] | |
retrieve_vectors: Option<bool>, | |
#[serde(skip_serializing_if = "Option::is_none")] | |
pub retrieve_vectors: Option<bool>, |
Pull Request
Related issue
Fixes #541
Fixes #612
Fixes #621
What does this PR do?
introduces theexperimental-vector-search
-feature flagAdds the required settings
with_embedders
does use the same "API" (not usingimpl AsRef
for items passed) aswith_synonyms
, as this is the closest existingset_embedders
has not been implemented upstream (at least when I try toPATCH
the object, it does not work){get,reset}_embedders
settings have been implemented.Said implementation goes with the work done in Implement vector search experimental feature v2 (v1.6) meilisearch-python#924
adds the
hybrid
field to search via the vector search to add an end-to-end test of this feature with thehuggingface
configuration.userProvided
seens more brittle, but we may want change to this insteadusing
userProvided
instead would mean (at the cost of hardcoding stuff)=> lower cpu effort
=> no higher timeout necceeseary
=> aligning with
meilisearch/meilisearch
to only have a test foruserProvided
)Caution
This is the difficult part:
I have not gotten this feature to work as I expect it.
I have documented what I would expect to happen here:
meilisearch-rust/src/search.rs
Lines 1223 to 1304 in 86334cd
I searched a bunch, but could not find any relevant testcases (only
userProvided
)Tip
The specific testcase can be executed via
cargo test --package meilisearch-sdk --lib search::tests::test_hybrid --features=experimental-vector-search -- --exact
TODO:
PR checklist
Please check if your PR fulfills the following requirements:
Thank you so much for contributing to Meilisearch!