Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
Signed-off-by: AnthonyTsu1984 <[email protected]>
  • Loading branch information
AnthonyTsu1984 committed Dec 6, 2024
1 parent 1e794d8 commit 064f281
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions site/en/tutorials/use_ColPali_with_milvus.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ title: Use ColPali for Multi-Modal Retrieval with Milvus

Modern retrieval models typically use a single embedding to represent text or images. ColBERT, however, is a neural model that utilizes a list of embeddings for each data instance and employs a "MaxSim" operation to calculate the similarity between two texts. Beyond textual data, figures, tables, and diagrams also contain rich information, which is often disregarded in text-based information retrieval.

![](../../../assets/colpali_formula.png)
![](../../../images/colpali_formula.png)

MaxSim function compares a query with a document (what you're searching in) by looking at their token embeddings. For each word in the query, it picks the most similar word from the document (using cosine similarity or squared L2 distance) and sums these maximum similarities across all words in the query

Expand All @@ -27,7 +27,6 @@ ColPali is a method that combines ColBERT's multi-vector representation with Pal

## Preparation


```shell
$ pip install pdf2image
$ pip pymilvus
Expand Down Expand Up @@ -61,6 +60,7 @@ import concurrent.futures
client = MilvusClient(uri="milvus.db")
```


<div class="alert note">

- If you only need a local vector database for small scale data or prototyping, setting the uri as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.
Expand Down Expand Up @@ -167,7 +167,7 @@ class MilvusColbertRetriever:
# Rerank a single document by retrieving its embeddings and calculating the similarity with the query.
doc_colbert_vecs = client.query(
collection_name=collection_name,
filter=f"doc_id in [{doc_id}, {doc_id + 1}]",
filter=f"doc_id in [{doc_id}]",
output_fields=["seq_id", "vector", "doc"],
limit=1000,
)
Expand Down

0 comments on commit 064f281

Please sign in to comment.