Merge pull request #2923 from liyun95/v2.5.x

deprecate bm25 embedding model
milvus-io · Dec 6, 2024 · bef085b · bef085b
2 parents 8ff775f + eba0372
commit bef085b
Show file tree

Hide file tree

Showing 3 changed files with 2 additions and 52 deletions.
diff --git a/site/en/about/overview.md b/site/en/about/overview.md
@@ -55,7 +55,7 @@ Milvus supports various types of search functions to meet the demands of differe
 - [Filtering Search](single-vector-search.md#Filtered-search): Performs ANN search under specified filtering conditions.
 - [Range Search](single-vector-search.md#Range-search): Finds vectors within a specified radius from your query vector.
 - [Hybrid Search](multi-vector-search.md): Conducts ANN search based on multiple vector fields.
-- Keyword Search: Keyword search based on BM25.
+- [Full Text Search](full-text-search.md): Full text search based on BM25.
 - [Reranking](reranking.md): Adjusts the order of search results based on additional criteria or a secondary algorithm, refining the initial ANN search results.
 - [Fetch](get-and-scalar-query.md#Get-Entities-by-ID): Retrieves data by their primary keys.
 - [Query](get-and-scalar-query.md#Use-Basic-Operators): Retrieves data using specific expressions.

diff --git a/site/en/embeddings/embeddings.md b/site/en/embeddings/embeddings.md
@@ -23,7 +23,6 @@ To create embeddings in action, refer to [Using PyMilvus's Model To Generate Tex
 | ------------------------------------------------------------------------------------- | ------- | -------------------- |
 |  [openai](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/OpenAIEmbeddingFunction/OpenAIEmbeddingFunction.md)                            |  Dense  |  API                 |
 |  [sentence-transformer](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/SentenceTransformerEmbeddingFunction/SentenceTransformerEmbeddingFunction.md) |  Dense  |  Open-sourced        |
-|  [bm25](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/BM25EmbeddingFunction/BM25EmbeddingFunction.md)                                |  Sparse |  Open-sourced        |
 |  [Splade](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/SpladeEmbeddingFunction/SpladeEmbeddingFunction.md)                            |  Sparse |  Open-sourced        |
 |  [bge-m3](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/BGEM3EmbeddingFunction/BGEM3EmbeddingFunction.md)                             |  Hybrid |  Open-sourced        |
 |  [voyageai](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/VoyageEmbeddingFunction/VoyageEmbeddingFunction.md)                            |  Dense  |  API                 |
@@ -42,7 +41,7 @@ To use embedding functions with Milvus, first install the PyMilvus client librar
 pip install "pymilvus[model]"
 ```
 
-The `model` subpackage supports various embedding models, from [OpenAI](https://milvus.io/docs/embed-with-openai.md), [Sentence Transformers](https://milvus.io/docs/embed-with-sentence-transform.md), [BGE M3](https://milvus.io/docs/embed-with-bgm-m3.md), [BM25](https://milvus.io/docs/embed-with-bm25.md), to [SPLADE](https://milvus.io/docs/embed-with-splade.md) pretrained models. For simpilicity, this example uses the `DefaultEmbeddingFunction` which is __all-MiniLM-L6-v2__ sentence transformer model, the model is about 70MB and it will be downloaded during first use:
+The `model` subpackage supports various embedding models, from [OpenAI](https://milvus.io/docs/embed-with-openai.md), [Sentence Transformers](https://milvus.io/docs/embed-with-sentence-transform.md), [BGE M3](https://milvus.io/docs/embed-with-bgm-m3.md), to [SPLADE](https://milvus.io/docs/embed-with-splade.md) pretrained models. For simpilicity, this example uses the `DefaultEmbeddingFunction` which is __all-MiniLM-L6-v2__ sentence transformer model, the model is about 70MB and it will be downloaded during first use:
 
 ```python
 from pymilvus import model
@@ -121,46 +120,3 @@ bge_m3_ef = BGEM3EmbeddingFunction(use_fp16=False, device="cpu")
 docs_embeddings = bge_m3_ef(docs)
 query_embeddings = bge_m3_ef([query])
 ```
-
-## Example 3: Generate  sparse vectors using BM25 model
-
-BM25 is a well-known method that uses word occurrence frequencies to determine the relevance between queries and documents. In this example, we will show how to use `BM25EmbeddingFunction` to generate sparse embeddings for both queries and documents.
-
-First, import the __BM25EmbeddingFunction__ class.
-
-```xml
-from pymilvus.model.sparse import BM25EmbeddingFunction
-```
-
-In BM25, it's important to calculate the statistics in your documents to obtain the IDF (Inverse Document Frequency), which can represent the pattern in your documents. The IDF is a measure of how much information a word provides, that is, whether it's common or rare across all documents.
-
-```python
-# 1. prepare a small corpus to search
-docs = [
-    "Artificial intelligence was founded as an academic discipline in 1956.",
-    "Alan Turing was the first person to conduct substantial research in AI.",
-    "Born in Maida Vale, London, Turing was raised in southern England.",
-]
-query = "Where was Turing born?"
-bm25_ef = BM25EmbeddingFunction()
-
-# 2. fit the corpus to get BM25 model parameters on your documents.
-bm25_ef.fit(docs)
-
-# 3. store the fitted parameters to disk to expedite future processing.
-bm25_ef.save("bm25_params.json")
-
-# 4. load the saved params
-new_bm25_ef = BM25EmbeddingFunction()
-new_bm25_ef.load("bm25_params.json")
-
-docs_embeddings = new_bm25_ef.encode_documents(docs)
-query_embeddings = new_bm25_ef.encode_queries([query])
-print("Dim:", new_bm25_ef.dim, list(docs_embeddings)[0].shape)
-```
-
-The expected output is similar to the following:
-
-```python
-Dim: 21 (1, 21)
-```
diff --git a/site/en/menuStructure/en.json b/site/en/menuStructure/en.json
@@ -777,12 +777,6 @@
             "order": 3,
             "children": []
           },
-          {
-            "label": "BM25",
-            "id": "embed-with-bm25.md",
-            "order": 4,
-            "children": []
-          },
           {
             "label": "SPLADE",
             "id": "embed-with-splade.md",