diff --git a/README.md b/README.md index d9f8042..768423f 100644 --- a/README.md +++ b/README.md @@ -527,6 +527,8 @@ Achieving good search in large-scale systems involves a combination of efficient - Using a more complex retriever to generate a set of candidate documents, models such as [ColBERT](https://arxiv.org/abs/2004.12832) and [BGE-M3](https://arxiv.org/abs/2402.03216) are often used for this purpose. - Using a reranker to re-rank the candidate documents generated by the retriever, models such as [mixedbread-ai/mxbai-rerank-large-v1](https://huggingface.co/mixedbread-ai/mxbai-rerank-large-v1) and [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) are often used for this purpose. - Combining the results of multiple retrieval methodologies such as dense and sparse using [Reciprocal Rank Fusion](https://www.assembled.com/blog/better-rag-results-with-reciprocal-rank-fusion-and-hybrid-search) +- Implement query understanding and decomposition to break down complex queries into simpler sub-queries, and then recombine the results. +- Use metadata to filter results, no system works with just semantic and vector similarity, metadata is often used to filter results. You might notice that the entire process is done in phases of increasing complexity, this is known as phased ranking or multistage retrieval.