From 143d2ff88f95634ec0b00f7bfc6251e34b39f92a Mon Sep 17 00:00:00 2001 From: Li Yin Date: Fri, 12 Jul 2024 07:04:44 -0700 Subject: [PATCH 01/12] quick fix of retriever format and component's sequential change --- .github/workflows/documentation.yml | 2 +- docs/source/tutorials/agent.rst | 2 ++ docs/source/tutorials/component.rst | 6 +++--- docs/source/tutorials/index.rst | 5 ++--- docs/source/tutorials/retriever.rst | 23 +++++++++++++++-------- 5 files changed, 23 insertions(+), 15 deletions(-) diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml index fd2d6971..aecce85b 100644 --- a/.github/workflows/documentation.yml +++ b/.github/workflows/documentation.yml @@ -3,7 +3,7 @@ name: Documentation on: push: branches: - - release # Trigger the workflow when changes are pushed to the release branch + - li # Trigger the workflow when changes are pushed to the release branch permissions: contents: write diff --git a/docs/source/tutorials/agent.rst b/docs/source/tutorials/agent.rst index 26afd674..28f47ab7 100644 --- a/docs/source/tutorials/agent.rst +++ b/docs/source/tutorials/agent.rst @@ -504,6 +504,8 @@ The above example will be formated as: **Subclass ReActAgent** If you want to customize the agent further, you can subclass the :class:`ReActAgent` and override the methods you want to change. + + .. .. figure:: /_static/images/query_1.png .. :align: center .. :alt: DataClass diff --git a/docs/source/tutorials/component.rst b/docs/source/tutorials/component.rst index 02ade72e..8e6a152c 100644 --- a/docs/source/tutorials/component.rst +++ b/docs/source/tutorials/component.rst @@ -253,7 +253,7 @@ Using a decorator is an even more convenient way to create a component from a fu .. code-block:: python - .. @fun_to_component + @fun_to_component def add_one(x): return x + 1 @@ -275,7 +275,7 @@ Let's put the `FunComponent`` and `DocQA`` together in a sequence: .. code-block:: python - from lightrag.core.component import Sequential + from lightrag.core.container import Sequential @fun_to_component def enhance_query(query:str) -> str: @@ -318,7 +318,7 @@ The structure of the sequence using ``print(seq)``: - :class:`core.component.Component` - :class:`core.component.FunComponent` - - :class:`core.component.Sequential` + - :class:`core.container.Sequential` - :func:`core.component.fun_to_component` diff --git a/docs/source/tutorials/index.rst b/docs/source/tutorials/index.rst index 03440921..fa1f251d 100644 --- a/docs/source/tutorials/index.rst +++ b/docs/source/tutorials/index.rst @@ -59,7 +59,7 @@ Additionally, what shines in LightRAG is that all orchestrator components, like You can easily make each component work with different models from different providers by switching out the `ModelClient` and its `model_kwargs`. -We will introduce the libraries starting from the core base classes, then move to the RAG essentials, and finally to the agent essentials. +We will introduce the library starting from the core base classes, then move to the RAG essentials, and finally to the agent essentials. With these building blocks, we will further introduce optimizing, where the optimizer uses building blocks such as Generator for auto-prompting and retriever for dynamic few-shot in-context learning (ICL). Building @@ -126,8 +126,7 @@ Code path: :ref:`lightrag.core`. For abstract classes: * - :doc:`embedder` - The component that orchestrates model client (Embedding models in particular) and output processors. * - :doc:`retriever` - - The base class for all retrievers who in particular retrieve relevant documents from a given database to add **context** to the generator. - + - The base class for all retrievers, which in particular retrieve relevant documents from a given database to add *context* to the generator. Data Pipeline and Storage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/source/tutorials/retriever.rst b/docs/source/tutorials/retriever.rst index 6fe4e99b..d403f989 100644 --- a/docs/source/tutorials/retriever.rst +++ b/docs/source/tutorials/retriever.rst @@ -83,7 +83,7 @@ LightRAG library does not prioritize the coverage of integration for the followi Instead, our design goals are: -1. Representative and valable coverage: +1. Cover representative and valuable retriever methods: a. High-precision retrieval methods and enabling them to work locally and in-memory so that researchers and developers can build and test more efficiently. b. Showcase how to work with cloud databases for large-scale data, utilizing their built-in search and filter methods. @@ -254,7 +254,7 @@ In this note, we will use the following documents and queries for demonstration: The first query should retrieve the first and the last document, and the second query should retrieve the second and the third document. FAISSRetriever -^^^^^^^^^^^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First, let's do semantic search, here we will use in-memory :class:`FAISSRetriever`. FAISS retriever takes embeddings which can be ``List[float]`` or ``np.ndarray`` and build an index using FAISS library. The query can take both embeddings and str formats. @@ -334,7 +334,7 @@ In default, the score is a simulated probabity in range ``[0, 1]`` using consine You can check the retriever for more type of scores. BM25Retriever -^^^^^^^^^^^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ So the semantic search works pretty well. We will see how :class:`BM25Retriever` works in comparison. We reimplemented the code in [9]_ with one improvement: instead of using ``text.split(" ")``, we use tokenizer to split the text. Here is a comparison of how they different: @@ -408,7 +408,8 @@ This time the retrieval gives us the right answer. [RetrieverOutput(doc_indices=[2, 1], doc_scores=[0.5343238380789569, 0.4568096570283078], query='solar panels?', documents=None)] Reranker as Retriever -^^^^^^^^^^^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Semantic search works well, and reranker basd on mostly `cross-encoder` model is supposed to work even better. We have integrated two rerankers: ``BAAI/bge-reranker-base`` [10]_ hosted on ``transformers`` and rerankers provided by ``Cohere`` [11]_. These models follow the ``ModelClient`` protocol and are directly accessible as retriever from :class:`RerankerRetriever`. @@ -518,7 +519,8 @@ Also, if we use both the `title` and `content`, it will also got the right respo LLM as Retriever -^^^^^^^^^^^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + There are differen ways to use LLM as a retriever: @@ -598,12 +600,16 @@ The response is: [RetrieverOutput(doc_indices=[1, 2], doc_scores=None, query='How do solar panels impact the environment?', documents=None)] + PostgresRetriever -^^^^^^^^^^^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Coming soon. Use Score Threshold instead of top_k -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + In some cases, when the retriever has a computed score and you might prefer to use the score instead of ``top_k`` to filter out the relevant documents. To do so, you can simplify set the ``top_k`` to the full size of the documents and use a post-processing step or a component(to chain with the retriever) to filter out the documents with the score below the threshold. @@ -613,7 +619,8 @@ Use together with Database When the scale of data is large, we will use a database to store the computed embeddings and indexes from the documents. With LocalDB -^^^^^^^^^^^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + We have previously computed embeddings, now let us :class:`LocalDB` to help with the persistence. (Although you can totally persist them yourself such as using pickle). Additionally, ``LocalDB`` help us keep track of our initial documents and its transformed documents. From b2f2e1858b821dbfd860f658d4a9e8fc524ca492 Mon Sep 17 00:00:00 2001 From: Li Yin Date: Fri, 12 Jul 2024 08:09:33 -0700 Subject: [PATCH 02/12] update read me --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 4780c735..be4800ee 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,8 @@ LLMs are like water; they can be shaped into anything, from GenAI applications s Because of this, no library can provide out-of-the-box solutions. Users must build toward their own use case. This requires the library to be modular, robust, and have a clean, readable codebase. The only code you should put into production is code you either 100% trust or are 100% clear about how to customize and iterate. -LightRAG is born to be light, modular, and robust, with a 100% readable codebase. +This is what LightRAG is: light, modular, and robust, with a 100% readable codebase. + Further reading: [Introduction](https://lightrag.sylph.ai/), [Design Philosophy](https://lightrag.sylph.ai/tutorials/lightrag_design_philosophy.html) and [Class hierarchy](https://lightrag.sylph.ai/tutorials/class_hierarchy.html). From 264933ba1d2a1edad268eabc60997b1893925632 Mon Sep 17 00:00:00 2001 From: Li Yin Date: Fri, 12 Jul 2024 19:37:57 -0700 Subject: [PATCH 03/12] fix the logo on discord in document home --- docs/source/index.rst | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 8e113932..f4104594 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -12,7 +12,7 @@ PyPI Version GitHub Stars - Discord + Discord License @@ -244,9 +244,11 @@ We are building a library that unites the two worlds, forming a healthy LLM appl .. resources/index .. hide the for contributors now -.. :glob: -.. :maxdepth: 1 -.. :caption: For Contributors -.. :hidden: -.. contributor/index +.. toctree:: + :glob: + :maxdepth: 1 + :caption: For Contributors + :hidden: + + contributor/index From 0b9a22d157c4969dda83994f0d6faafd27fd343e Mon Sep 17 00:00:00 2001 From: Li Yin Date: Fri, 12 Jul 2024 20:11:17 -0700 Subject: [PATCH 04/12] fix the grammar error at readme --- README.md | 2 +- docs/source/apis/components/index.rst | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index be4800ee..64973cac 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,7 @@ It is *light*, *modular*, and *robust*, with a 100% readable codebase. LLMs are like water; they can be shaped into anything, from GenAI applications such as chatbots, translation, summarization, code generation, and autonomous agents to classical NLP tasks like text classification and named entity recognition. They interact with the world beyond the model’s internal knowledge via retrievers, memory, and tools (function calls). Each use case is unique in its data, business logic, and user experience. -Because of this, no library can provide out-of-the-box solutions. Users must build toward their own use case. This requires the library to be modular, robust, and have a clean, readable codebase. The only code you should put into production is code you either 100% trust or are 100% clear about how to customize and iterate. +Because of this, no library can provide out-of-the-box solutions. Users must build towards their own use case. This requires the library to be modular, robust, and have a clean, readable codebase. The only code you should put into production is code you either 100% trust or are 100% clear about how to customize and iterate. This is what LightRAG is: light, modular, and robust, with a 100% readable codebase. diff --git a/docs/source/apis/components/index.rst b/docs/source/apis/components/index.rst index 4b31f774..089a42f3 100644 --- a/docs/source/apis/components/index.rst +++ b/docs/source/apis/components/index.rst @@ -82,6 +82,7 @@ Reasoning .. toctree:: :maxdepth: 1 + :hidden: components.model_client components.retriever From 71ed1fe57eb426700383fac76b6f4a23d43359b2 Mon Sep 17 00:00:00 2001 From: Li Yin Date: Sat, 13 Jul 2024 17:53:15 -0700 Subject: [PATCH 05/12] update the readme and remove star counts from the file --- README.md | 40 ++++++-- docs/source/index.rst | 15 ++- lightrag/README.md | 220 +++++++++++++++++++++++++++++++++++++----- 3 files changed, 235 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index 64973cac..57030906 100644 --- a/README.md +++ b/README.md @@ -2,10 +2,10 @@ + [![License](https://img.shields.io/github/license/SylphAI-Inc/LightRAG)](https://opensource.org/license/MIT) [![PyPI](https://img.shields.io/pypi/v/lightRAG?style=flat-square)](https://pypi.org/project/lightRAG/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/lightRAG?style=flat-square)](https://pypistats.org/packages/lightRAG) -[![GitHub star chart](https://img.shields.io/github/stars/SylphAI-Inc/LightRAG?style=flat-square)](https://star-history.com/#SylphAI-Inc/LightRAG) [![Open Issues](https://img.shields.io/github/issues-raw/SylphAI-Inc/LightRAG?style=flat-square)](https://github.com/SylphAI-Inc/LightRAG/issues) [![](https://dcbadge.vercel.app/api/server/zt2mTPcu?compact=true&style=flat)](https://discord.gg/zt2mTPcu) @@ -58,8 +58,11 @@ class Net(nn.Module): ``` --> # LightRAG Task Pipeline +We will ask the model to respond with ``explanation`` and ``example`` of a concept. To achieve this, we will build a simple pipeline to get the structured output as ``QAOutput``. -We will ask the model to respond with ``explanation`` and ``example`` of a concept. And we will build a pipeline to get the structured output as ``QAOutput``. +## Well-designed Base Classes + +This leverages our two and only powerful base classes: `Component` as building blocks for the pipeline and `DataClass` to ease the data interaction with LLMs. ```python @@ -120,9 +123,9 @@ output = qa("What is LLM?") print(output) ``` -**Structure of the pipeline** +## Clear Pipeline Structure -Here is what we get from ``print(qa)``: +Simply by using `print(qa)`, you can see the pipeline structure, which helps users understand any LLM workflow quickly. ``` QA( @@ -162,16 +165,17 @@ QA( ) ``` -**The output** +**The Output** +We structure the output to both track the data and potential errors if any part of the Generator component fails. Here is what we get from ``print(output)``: ``` GeneratorOutput(data=QAOutput(explanation='LLM stands for Large Language Model, which refers to a type of artificial intelligence designed to process and generate human-like language.', example='For instance, LLMs are used in chatbots and virtual assistants, such as Siri and Alexa, to understand and respond to natural language input.'), error=None, usage=None, raw_response='```\n{\n "explanation": "LLM stands for Large Language Model, which refers to a type of artificial intelligence designed to process and generate human-like language.",\n "example": "For instance, LLMs are used in chatbots and virtual assistants, such as Siri and Alexa, to understand and respond to natural language input."\n}', metadata=None) ``` -**See the prompt** +**Focus on the Prompt** -Use the following code: +Use the following code will let us see the prompt after it is formatted: ```python @@ -204,6 +208,28 @@ User: What is LLM? You: ```` +## Model-agnostic + + +You can switch to any model simply by using a different model_client (provider) and model_kwargs. +Let's use OpenAI's gpt-3.5-turbo model on the same pipeline. + + +You can switch to any model simply by using a different `model_client` (provider) and `model_kwargs`. +Let's use OpenAI's `gpt-3.5-turbo` model. + +```python +from lightrag.components.model_client import OpenAIClient + +self.generator = Generator( + model_client=OpenAIClient(), + model_kwargs={"model": "gpt-3.5-turbo"}, + template=qa_template, + prompt_kwargs={"output_format_str": parser.format_instructions()}, + output_processors=parser, +) +``` + # Quick Install diff --git a/docs/source/index.rst b/docs/source/index.rst index f4104594..864913a7 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -9,14 +9,14 @@ .. raw:: html
- PyPI Version - GitHub Stars Discord License
+.. GitHub Stars + .. PyPI Downloads @@ -245,10 +245,9 @@ We are building a library that unites the two worlds, forming a healthy LLM appl .. hide the for contributors now -.. toctree:: - :glob: - :maxdepth: 1 - :caption: For Contributors - :hidden: + .. :glob: + .. :maxdepth: 1 + .. :caption: For Contributors + .. :hidden: - contributor/index + .. contributor/index diff --git a/lightrag/README.md b/lightrag/README.md index 71b3294c..57030906 100644 --- a/lightrag/README.md +++ b/lightrag/README.md @@ -1,12 +1,36 @@ ![LightRAG Logo](https://raw.githubusercontent.com/SylphAI-Inc/LightRAG/main/docs/source/_static/images/LightRAG-logo-doc.jpeg) + + + +[![License](https://img.shields.io/github/license/SylphAI-Inc/LightRAG)](https://opensource.org/license/MIT) +[![PyPI](https://img.shields.io/pypi/v/lightRAG?style=flat-square)](https://pypi.org/project/lightRAG/) +[![PyPI - Downloads](https://img.shields.io/pypi/dm/lightRAG?style=flat-square)](https://pypistats.org/packages/lightRAG) +[![Open Issues](https://img.shields.io/github/issues-raw/SylphAI-Inc/LightRAG?style=flat-square)](https://github.com/SylphAI-Inc/LightRAG/issues) +[![](https://dcbadge.vercel.app/api/server/zt2mTPcu?compact=true&style=flat)](https://discord.gg/zt2mTPcu) + ### ⚡ The Lightning Library for Large Language Model Applications ⚡ -*LightRAG* helps developers with both building and optimizing *Retriever-Agent-Generator (RAG)* pipelines. -It is *light*, *modular*, and *robust*. +*LightRAG* helps developers with both building and optimizing *Retriever-Agent-Generator* pipelines. +It is *light*, *modular*, and *robust*, with a 100% readable codebase. + + + + +# Why LightRAG? + +LLMs are like water; they can be shaped into anything, from GenAI applications such as chatbots, translation, summarization, code generation, and autonomous agents to classical NLP tasks like text classification and named entity recognition. They interact with the world beyond the model’s internal knowledge via retrievers, memory, and tools (function calls). Each use case is unique in its data, business logic, and user experience. + +Because of this, no library can provide out-of-the-box solutions. Users must build towards their own use case. This requires the library to be modular, robust, and have a clean, readable codebase. The only code you should put into production is code you either 100% trust or are 100% clear about how to customize and iterate. + +This is what LightRAG is: light, modular, and robust, with a 100% readable codebase. +Further reading: [Introduction](https://lightrag.sylph.ai/), [Design Philosophy](https://lightrag.sylph.ai/tutorials/lightrag_design_philosophy.html) and [Class hierarchy](https://lightrag.sylph.ai/tutorials/class_hierarchy.html). + + + +# LightRAG Task Pipeline + +We will ask the model to respond with ``explanation`` and ``example`` of a concept. To achieve this, we will build a simple pipeline to get the structured output as ``QAOutput``. -**LightRAG** +## Well-designed Base Classes + +This leverages our two and only powerful base classes: `Component` as building blocks for the pipeline and `DataClass` to ease the data interaction with LLMs. ```python -from lightrag.core import Component, Generator +from dataclasses import dataclass, field + +from lightrag.core import Component, Generator, DataClass from lightrag.components.model_client import GroqAPIClient -from lightrag.utils import setup_env #noqa +from lightrag.components.output_parsers import JsonOutputParser -class SimpleQA(Component): - def __init__(self): - super().__init__() - template = r""" +@dataclass +class QAOutput(DataClass): + explanation: str = field( + metadata={"desc": "A brief explanation of the concept in one sentence."} + ) + example: str = field(metadata={"desc": "An example of the concept in a sentence."}) + + + +qa_template = r""" +You are a helpful assistant. + +{{output_format_str}} + + +User: {{input_str}} +You:""" + +class QA(Component): + def __init__(self): + super().__init__() + + parser = JsonOutputParser(data_class=QAOutput, return_data_class=True) + self.generator = Generator( + model_client=GroqAPIClient(), + model_kwargs={"model": "llama3-8b-8192"}, + template=qa_template, + prompt_kwargs={"output_format_str": parser.format_instructions()}, + output_processors=parser, + ) + + def call(self, query: str): + return self.generator.call({"input_str": query}) + + async def acall(self, query: str): + return await self.generator.acall({"input_str": query}) +``` + + +Run the following code for visualization and calling the model. + +```python + +qa = QA() +print(qa) + +# call +output = qa("What is LLM?") +print(output) +``` + +## Clear Pipeline Structure + +Simply by using `print(qa)`, you can see the pipeline structure, which helps users understand any LLM workflow quickly. + +``` +QA( + (generator): Generator( + model_kwargs={'model': 'llama3-8b-8192'}, + (prompt): Prompt( + template: You are a helpful assistant. + + {{output_format_str}} + User: {{input_str}} - You: - """ - self.generator = Generator( - model_client=GroqAPIClient(), - model_kwargs={"model": "llama3-8b-8192"}, - template=template, + You:, prompt_kwargs: {'output_format_str': 'Your output should be formatted as a standard JSON instance with the following schema:\n```\n{\n "explanation": "A brief explanation of the concept in one sentence. (str) (required)",\n "example": "An example of the concept in a sentence. (str) (required)"\n}\n```\n-Make sure to always enclose the JSON output in triple backticks (```). Please do not add anything other than valid JSON output!\n-Use double quotes for the keys and string values.\n-Follow the JSON formatting conventions.'}, prompt_variables: ['output_format_str', 'input_str'] + ) + (model_client): GroqAPIClient() + (output_processors): JsonOutputParser( + data_class=QAOutput, examples=None, exclude_fields=None, return_data_class=True + (json_output_format_prompt): Prompt( + template: Your output should be formatted as a standard JSON instance with the following schema: + ``` + {{schema}} + ``` + {% if example %} + Examples: + ``` + {{example}} + ``` + {% endif %} + -Make sure to always enclose the JSON output in triple backticks (```). Please do not add anything other than valid JSON output! + -Use double quotes for the keys and string values. + -Follow the JSON formatting conventions., prompt_variables: ['schema', 'example'] ) + (output_processors): JsonParser() + ) + ) +) +``` + +**The Output** + +We structure the output to both track the data and potential errors if any part of the Generator component fails. +Here is what we get from ``print(output)``: + +``` +GeneratorOutput(data=QAOutput(explanation='LLM stands for Large Language Model, which refers to a type of artificial intelligence designed to process and generate human-like language.', example='For instance, LLMs are used in chatbots and virtual assistants, such as Siri and Alexa, to understand and respond to natural language input.'), error=None, usage=None, raw_response='```\n{\n "explanation": "LLM stands for Large Language Model, which refers to a type of artificial intelligence designed to process and generate human-like language.",\n "example": "For instance, LLMs are used in chatbots and virtual assistants, such as Siri and Alexa, to understand and respond to natural language input."\n}', metadata=None) +``` +**Focus on the Prompt** + +Use the following code will let us see the prompt after it is formatted: + +```python + +qa2.generator.print_prompt( + output_format_str=qa2.generator.output_processors.format_instructions(), + input_str="What is LLM?", +) +``` + + +The output will be: + +````markdown + +You are a helpful assistant. + +Your output should be formatted as a standard JSON instance with the following schema: +``` +{ + "explanation": "A brief explanation of the concept in one sentence. (str) (required)", + "example": "An example of the concept in a sentence. (str) (required)" +} +``` +-Make sure to always enclose the JSON output in triple backticks (```). Please do not add anything other than valid JSON output! +-Use double quotes for the keys and string values. +-Follow the JSON formatting conventions. + + +User: What is LLM? +You: +```` + +## Model-agnostic - def call(self, query): - return self.generator({"input_str": query}) - async def acall(self, query): - return await self.generator.acall({"input_str": query}) +You can switch to any model simply by using a different model_client (provider) and model_kwargs. +Let's use OpenAI's gpt-3.5-turbo model on the same pipeline. + + +You can switch to any model simply by using a different `model_client` (provider) and `model_kwargs`. +Let's use OpenAI's `gpt-3.5-turbo` model. + +```python +from lightrag.components.model_client import OpenAIClient + +self.generator = Generator( + model_client=OpenAIClient(), + model_kwargs={"model": "gpt-3.5-turbo"}, + template=qa_template, + prompt_kwargs={"output_format_str": parser.format_instructions()}, + output_processors=parser, +) ``` -## Quick Install + +# Quick Install Install LightRAG with pip: @@ -75,20 +243,22 @@ Please refer to the [full installation guide](https://lightrag.sylph.ai/get_star + # Documentation LightRAG full documentation available at [lightrag.sylph.ai](https://lightrag.sylph.ai/): - [Introduction](https://lightrag.sylph.ai/) - [Full installation guide](https://lightrag.sylph.ai/get_started/installation.html) -- [Design philosophy](https://lightrag.sylph.ai/tutorials/lightrag_design_philosophy.html): Design based on three principles: Simplicity over complexity, Quality over quantity, and Optimizing over building. -- [Class hierarchy](https://lightrag.sylph.ai/tutorials/class_hierarchy.html): We have no more than two levels of subclasses. The bare minimum abstraction provides developers with maximum customizability and simplicity. -- [Tutorials](https://lightrag.sylph.ai/tutorials/index.html): Learn the `why` and `how-to` (customize and integrate) behind each core part within the `LightRAG` library. +- [Design philosophy](https://lightrag.sylph.ai/tutorials/lightrag_design_philosophy.html) +- [Class hierarchy](https://lightrag.sylph.ai/tutorials/class_hierarchy.html) +- [Tutorials](https://lightrag.sylph.ai/tutorials/index.html) - [API reference](https://lightrag.sylph.ai/apis/index.html) -## Contributors + +# Contributors [![contributors](https://contrib.rocks/image?repo=SylphAI-Inc/LightRAG&max=2000)](https://github.com/SylphAI-Inc/LightRAG/graphs/contributors) From 1a6f349a4bc0510ce2f6246d1926876fbf5ff6da Mon Sep 17 00:00:00 2001 From: Li Yin Date: Sun, 14 Jul 2024 12:03:55 -0700 Subject: [PATCH 06/12] add a call and address mypy error for sequential --- .github/workflows/documentation.yml | 2 +- docs/source/tutorials/generator.rst | 7 +-- docs/source/tutorials/retriever.rst | 66 +++++++++++++++-------------- lightrag/CHANGELOG.md | 3 ++ lightrag/lightrag/core/container.py | 58 +++++++++++++++++++------ 5 files changed, 88 insertions(+), 48 deletions(-) diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml index aecce85b..fd2d6971 100644 --- a/.github/workflows/documentation.yml +++ b/.github/workflows/documentation.yml @@ -3,7 +3,7 @@ name: Documentation on: push: branches: - - li # Trigger the workflow when changes are pushed to the release branch + - release # Trigger the workflow when changes are pushed to the release branch permissions: contents: write diff --git a/docs/source/tutorials/generator.rst b/docs/source/tutorials/generator.rst index 60b6bdc9..369807b2 100644 --- a/docs/source/tutorials/generator.rst +++ b/docs/source/tutorials/generator.rst @@ -12,7 +12,7 @@ Generator `Generator` is a user-facing orchestration component with a simple and unified interface for LLM prediction. -It is a pipeline consisting of three subcomponents. +It is a pipeline consisting of three subcomponents. By switching the prompt template, model client, and output parser, users have full control and flexibility. Design --------------------------------------- @@ -26,11 +26,10 @@ Design - The :class:`Generator` is designed to achieve the following goals: 1. Model Agnostic: The Generator should be able to call any LLM model with the same prompt. -2. Unified Interface: It should manage the pipeline from prompt(input)->model call -> output parsing. +2. Unified interface: It manages the pipeline from prompt (input) -> model call -> output parsing, while still giving users full control over each part. 3. Unified Output: This will make it easy to log and save records of all LLM predictions. 4. Work with Optimizer: It should be able to work with Optimizer to optimize the prompt. @@ -443,6 +442,7 @@ Besides these examples, LLM is like water, even in our library, we have componen - :class:`LLMRetriever` is a retriever that uses Generator to call LLM to retrieve the most relevant documents. - :class:`DefaultLLMJudge` is a judge that uses Generator to call LLM to evaluate the quality of the response. - :class:`LLMOptimizer` is an optimizer that uses Generator to call LLM to optimize the prompt. +- :class:`ReAct Agent Planner` is an LLM planner that uses Generator to plan and to call functions in ReAct Agent. Tracing --------------------------------------- @@ -479,6 +479,7 @@ Coming soon! - :class:`tracing.generator_call_logger.GeneratorCallLogger` - :class:`tracing.generator_state_logger.GeneratorStateLogger` - :class:`components.retriever.llm_retriever.LLMRetriever` + - :class:`components.agent.react.ReActAgent` - :class:`eval.llm_as_judge.DefaultLLMJudge` - :class:`optim.llm_optimizer.LLMOptimizer` - :func:`utils.config.new_component` diff --git a/docs/source/tutorials/retriever.rst b/docs/source/tutorials/retriever.rst index d403f989..4b8d1bae 100644 --- a/docs/source/tutorials/retriever.rst +++ b/docs/source/tutorials/retriever.rst @@ -120,9 +120,14 @@ Working with ``DialogTurn`` can help manage ``conversation_history``, especiall Retriever Data Types -^^^^^^^^^^^^^^^^^^^^^^^^ -In most cases, the query is string. But there are cases we might need both text and images as a query, such as "find me a cloth that looks like this". -We defined the query type as: +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Query** + +In most cases, the query is string. But there are cases where we might need both text and images as a query, such as "find me a cloth that looks like this". +We defined the query type `RetrieverQueriesType` so that all of our retrievers should handle both single query and multiple queries at once. +For text-based retrievers, we defined `RetrieverStrQueriesType` as a string or a sequence of strings. + .. code-block:: python @@ -131,28 +136,29 @@ We defined the query type as: RetrieverQueriesType = Union[RetrieverQueryType, Sequence[RetrieverQueryType]] RetrieverStrQueriesType = Union[str, Sequence[RetrieverStrQueryType]] -As we see, our retriever should be able to handle both single query and multiple queries at once. +**Documents** -The documents are a sequence of document of any type that will be later specified by the subclass: +The documents are a sequence of documents of any type, which will be later specified by the subclass: .. code-block:: python RetrieverDocumentType = TypeVar("RetrieverDocumentType", contravariant=True) # a single document RetrieverDocumentsType = Sequence[RetrieverDocumentType] # The final documents types retriever can use +**Output** -We further define the same output format so that we can easily switch between different retrievers in our task pipeline. -Here is our output format: +We further definied the unified output data structure :class:`RetrieverOutput` so that we can easily switch between different retrievers in our task pipeline. +A retriever should return a list of `RetrieverOutput` to support multiple queries at once. This is helpful for: +(1) Batch-processing: Especially for semantic search, where multiple queries can be represented as numpy array and computed all at once, providing faster speeds than processing each query one by one. +(2) Query expansion: To increase recall, users often generate multiple queries from the original query. -.. code-block:: python - class RetrieverOutput(DataClass): - __doc__ = r"""Save the output of a single query in retrievers. - It is up to the subclass of Retriever to specify the type of query and document. - """ +.. code-block:: python + @dataclass + class RetrieverOutput(DataClass): doc_indices: List[int] = field(metadata={"desc": "List of document indices"}) doc_scores: Optional[List[float]] = field( default=None, metadata={"desc": "List of document scores"} @@ -167,11 +173,24 @@ Here is our output format: RetrieverOutputType = List[RetrieverOutput] # so to support multiple queries at once -You can find the types in :ref:`types`. The list of queries and `RetrieverOutput` can be helpful for: -(1) Batch-processing: especially for semantic search where multiple queries can be represented as numpy array and be computed all at once with faster speed than doing one by one. -(2) For `query expansion` where to increase the recall, users often generate multiple queries from the original query. +**Document and TextSplitter** + +If your documents (in text format) are too large, it is common practise to first use :class:`TextSplitter` to split the text into smaller chunks. +Please refer to the :doc:`text_splitter` tutorial on how to use it. + + + +Retriever Base Class +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Functionally, the base retriever :class:`Retriever` defines another required method ``build_index_from_documents`` where the subclass will prepare the retriever for the actual retrieval calls. +Optionally, the subclass can implement ``save_to_file`` and ``load_from_file`` to save and load the retriever to/from disk. +As the retriever is a subclass of component, you already inherited powerful serialization and deserialization methods such as ``to_dict``, ``from_dict``, and ``from_config`` to help +with the saving and loading process. As for helper attributes, we have ``indexed`` and ``index_keys`` to differentiate if the retriever is ready for retrieval and the attributes that are key to restore the functionality/states of the retriever. +It is up the subclass to decide how to decide the storage of the index, it can be in-memory, local disk, or cloud storage, or save as json or pickle file or even a db table. +As an example, :class:`BM25Retriever` has the following key attributes to index. .. code-block:: python @@ -196,24 +215,9 @@ You can find the types in :ref:`types`. The list of queries and `Ret raise NotImplementedError(f"Async retrieve is not implemented") -**Document and TextSplitter** - -If your documents(text format) are too large and it is a common practise to first use ``TextSplitter`` to split them into smaller chunks. -Please refer to :doc:`text_splitter` and our provided notebook on how to use it. - - - -Retriever Base Class -^^^^^^^^^^^^^^^^^^^^^^^^ +.. code:: python -Functionally, the base retriever :class:`Retriever` defines another required method ``build_index_from_documents`` where the subclass will prepare the retriever for the actual retrieval calls. -Optionally, the subclass can implement ``save_to_file`` and ``load_from_file`` to save and load the retriever to/from disk. -As the retriever is a subclass of component, you already inherited powerful serialization and deserialization methods such as ``to_dict``, ``from_dict``, and ``from_config`` to help -with the saving and loading process. As for helper attributes, we have ``indexed`` and ``index_keys`` to differentiate if the retriever is ready for retrieval and the attributes that are key to restore the functionality/states of the retriever. -It is up the subclass to decide how to decide the storage of the index, it can be in-memory, local disk, or cloud storage, or save as json or pickle file or even a db table. -As an example, :class:`BM25Retriever` has the following key attributes to index. -.. code:: python self.index_keys = ["nd", "t2d", "idf","doc_len","avgdl","total_documents","top_k","k1","b","epsilon","indexed"] diff --git a/lightrag/CHANGELOG.md b/lightrag/CHANGELOG.md index fd876f53..00076909 100644 --- a/lightrag/CHANGELOG.md +++ b/lightrag/CHANGELOG.md @@ -1,3 +1,6 @@ +### Added +- `Sequential` adds `acall` method. + ## [0.0.0-beta.1] - 2024-07-10 ### Added diff --git a/lightrag/lightrag/core/container.py b/lightrag/lightrag/core/container.py index 7170e1b6..6f7c14e5 100644 --- a/lightrag/lightrag/core/container.py +++ b/lightrag/lightrag/core/container.py @@ -13,7 +13,7 @@ class Sequential(Component): __doc__ = r"""A sequential container. - Follows the same design pattern as PyTorch's ``nn.Sequential``. + Adapted from PyTorch's ``nn.Sequential``. Components will be added to it in the order they are passed to the constructor. Alternatively, an ``OrderedDict`` of components can be passed in. @@ -97,7 +97,7 @@ def call(self, input: int) -> int: >>> result = seq.call(2, 3) """ - _components: Dict[str, Component] # = OrderedDict() + _components: Dict[str, Component] = OrderedDict() # type: ignore[assignment] @overload def __init__(self, *args: Component) -> None: ... @@ -114,7 +114,7 @@ def __init__(self, *args): for idx, component in enumerate(args): self.add_component(str(idx), component) - def _get_item_by_idx(self, iterator: Iterator[T], idx: int) -> T: + def _get_item_by_idx(self, iterator: Iterator[Component], idx: int) -> Component: """Get the idx-th item of the iterator.""" size = len(self) idx = operator.index(idx) @@ -132,15 +132,18 @@ def __getitem__( elif isinstance(idx, str): return self._components[idx] else: - return self._get_item_by_idx(self._components.values(), idx) + return self._get_item_by_idx(iter(self._components.values()), idx) def __setitem__(self, idx: Union[int, str], component: Component) -> None: """Set the idx-th component of the Sequential.""" if isinstance(idx, str): self._components[idx] = component else: - key: str = self._get_item_by_idx(self._components.keys(), idx) - return setattr(self, key, component) + # key: str = self._get_item_by_idx(iter(self._components.keys()), idx) + # self._components[key] = component + key_list = list(self._components.keys()) + key = key_list[idx] + self._components[key] = component def __delitem__(self, idx: Union[slice, int, str]) -> None: """Delete the idx-th component of the Sequential.""" @@ -150,15 +153,18 @@ def __delitem__(self, idx: Union[slice, int, str]) -> None: elif isinstance(idx, str): del self._components[idx] else: - key = self._get_item_by_idx(self._components.keys(), idx) + # key = self._get_item_by_idx(iter(self._components.keys()), idx) + key_list = list(self._components.keys()) + key = key_list[idx] + delattr(self, key) - # To preserve numbering - str_indices = [str(i) for i in range(len(self._components))] + + # Reordering is needed if numerical keys are used to keep the sequence self._components = OrderedDict( - list(zip(str_indices, self._components.values())) + (str(i), comp) for i, comp in enumerate(self._components.values()) ) - def __iter__(self) -> Iterable[Component]: + def __iter__(self) -> Iterator[Component]: r"""Iterates over the components of the Sequential. Examples: @@ -250,6 +256,33 @@ def call(self, *args: Any, **kwargs: Any) -> object: kwargs = {} return args[0] if len(args) == 1 else (args, kwargs) + @overload + async def acall(self, input: Any) -> object: ... + + @overload + async def acall(self, *args: Any, **kwargs: Any) -> object: ... + + async def acall(self, *args: Any, **kwargs: Any) -> object: + r"""When you for loop or multiple await calls inside each component, use acall method can potentially speed up the execution.""" + if len(args) == 1 and not kwargs: + input = args[0] + for component in self._components.values(): + input = await component(input) + return input + else: + for component in self._components.values(): + result = await component(*args, **kwargs) + if ( + isinstance(result, tuple) + and len(result) == 2 + and isinstance(result[1], dict) + ): + args, kwargs = result + else: + args = (result,) + kwargs = {} + return args[0] if len(args) == 1 else (args, kwargs) + def append(self, component: Component) -> "Sequential": r"""Appends a component to the end of the Sequential.""" idx = len(self._components) @@ -259,7 +292,7 @@ def append(self, component: Component) -> "Sequential": def insert(self, idx: int, component: Component) -> None: r"""Inserts a component at a given index in the Sequential.""" if not isinstance(component, Component): - raise AssertionError( + raise TypeError( f"component should be an instance of Component, but got {type(component)}" ) n = len(self._components) @@ -272,7 +305,6 @@ def insert(self, idx: int, component: Component) -> None: for i in range(n, idx, -1): self._components[str(i)] = self._components[str(i - 1)] self._components[str(idx)] = component - return self def extend(self, components: Iterable[Component]) -> "Sequential": r"""Extends the Sequential with components from an iterable.""" From 992ed11a0be253b649036fa30d802e93040cb9ac Mon Sep 17 00:00:00 2001 From: Li Yin Date: Sun, 14 Jul 2024 14:48:41 -0700 Subject: [PATCH 07/12] improve the read me message --- README.md | 16 +++--- docs/poetry.lock | 116 +++++++++++++++++++++--------------------- docs/source/conf.py | 6 +++ docs/source/index.rst | 6 +-- 4 files changed, 74 insertions(+), 70 deletions(-) diff --git a/README.md b/README.md index 57030906..40e8c53d 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,18 @@ ![LightRAG Logo](https://raw.githubusercontent.com/SylphAI-Inc/LightRAG/main/docs/source/_static/images/LightRAG-logo-doc.jpeg) - - -[![License](https://img.shields.io/github/license/SylphAI-Inc/LightRAG)](https://opensource.org/license/MIT) [![PyPI](https://img.shields.io/pypi/v/lightRAG?style=flat-square)](https://pypi.org/project/lightRAG/) +[![GitHub star chart](https://img.shields.io/github/stars/SylphAI-Inc/LightRAG?style=flat-square)](https://star-history.com/#SylphAI-Inc/LightRAG) [![PyPI - Downloads](https://img.shields.io/pypi/dm/lightRAG?style=flat-square)](https://pypistats.org/packages/lightRAG) [![Open Issues](https://img.shields.io/github/issues-raw/SylphAI-Inc/LightRAG?style=flat-square)](https://github.com/SylphAI-Inc/LightRAG/issues) +[![License](https://img.shields.io/github/license/SylphAI-Inc/LightRAG)](https://opensource.org/license/MIT) [![](https://dcbadge.vercel.app/api/server/zt2mTPcu?compact=true&style=flat)](https://discord.gg/zt2mTPcu) ### ⚡ The Lightning Library for Large Language Model Applications ⚡ -*LightRAG* helps developers with both building and optimizing *Retriever-Agent-Generator* pipelines. -It is *light*, *modular*, and *robust*, with a 100% readable codebase. - - +*LightRAG* helps developers build and optimize *Retriever-Agent-Generator* pipelines. +Embracing a design philosophy similar to *PyTorch*, it is *light*, *modular*, and *robust*, with a 100% readable codebase. # Why LightRAG? @@ -27,7 +24,8 @@ Because of this, no library can provide out-of-the-box solutions. Users must bui This is what LightRAG is: light, modular, and robust, with a 100% readable codebase. -Further reading: [Introduction](https://lightrag.sylph.ai/), [Design Philosophy](https://lightrag.sylph.ai/tutorials/lightrag_design_philosophy.html) and [Class hierarchy](https://lightrag.sylph.ai/tutorials/class_hierarchy.html). +Further reading: [How We Started](https://www.linkedin.com/posts/li-yin-ai_both-ai-research-and-engineering-use-pytorch-activity-7189366364694892544-Uk1U?utm_source=share&utm_medium=member_desktop), +[Introduction](https://lightrag.sylph.ai/), [Design Philosophy](https://lightrag.sylph.ai/tutorials/lightrag_design_philosophy.html) and [Class hierarchy](https://lightrag.sylph.ai/tutorials/class_hierarchy.html). - -[![PyPI](https://img.shields.io/pypi/v/lightRAG?style=flat-square)](https://pypi.org/project/lightRAG/) -[![GitHub star chart](https://img.shields.io/github/stars/SylphAI-Inc/LightRAG?style=flat-square)](https://star-history.com/#SylphAI-Inc/LightRAG) -[![PyPI - Downloads](https://img.shields.io/pypi/dm/lightRAG?style=flat-square)](https://pypistats.org/packages/lightRAG) -[![Open Issues](https://img.shields.io/github/issues-raw/SylphAI-Inc/LightRAG?style=flat-square)](https://github.com/SylphAI-Inc/LightRAG/issues) -[![License](https://img.shields.io/github/license/SylphAI-Inc/LightRAG)](https://opensource.org/license/MIT) -[![](https://dcbadge.vercel.app/api/server/zt2mTPcu?compact=true&style=flat)](https://discord.gg/zt2mTPcu) + +

+ + discord-invite + +

+ +

+ + PyPI Version + + + PyPI Downloads + + + Open Issues + + + License + +

+ + + + + + ### ⚡ The Lightning Library for Large Language Model Applications ⚡ @@ -209,10 +230,6 @@ You: ## Model-agnostic -You can switch to any model simply by using a different model_client (provider) and model_kwargs. -Let's use OpenAI's gpt-3.5-turbo model on the same pipeline. - - You can switch to any model simply by using a different `model_client` (provider) and `model_kwargs`. Let's use OpenAI's `gpt-3.5-turbo` model. diff --git a/docs/source/index.rst b/docs/source/index.rst index 4fe8820d..8439d13a 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -10,6 +10,9 @@
PyPI Version + + GitHub Repo + GitHub Stars Discord License From 30100063297eea6f68a27cc549498871546858cb Mon Sep 17 00:00:00 2001 From: Li Yin Date: Mon, 15 Jul 2024 16:11:07 -0700 Subject: [PATCH 10/12] rag use case and extra packages --- .gitignore | 1 + README.md | 13 +- docs/source/get_started/installation.rst | 28 +- lightrag/CHANGELOG.md | 9 + lightrag/PACKAGING.md | 29 + lightrag/README.md | 4 +- .../components/retriever/faiss_retriever.py | 9 + lightrag/lightrag/core/db.py | 6 +- lightrag/lightrag/utils/lazy_import.py | 2 +- lightrag/poetry.lock | 904 ++++++++++++++---- lightrag/pyproject.toml | 35 +- poetry.lock | 167 ++-- tutorials/.gitignore | 1 + tutorials/rag.ipynb | 16 +- use_cases/.gitignore | 3 +- use_cases/{rag.py => rag_yaml_config.py} | 0 use_cases/simple_rag.py | 190 ++-- 17 files changed, 1013 insertions(+), 404 deletions(-) create mode 100644 lightrag/PACKAGING.md rename use_cases/{rag.py => rag_yaml_config.py} (100%) diff --git a/.gitignore b/.gitignore index b4862275..060db0dc 100644 --- a/.gitignore +++ b/.gitignore @@ -42,3 +42,4 @@ storage/ /*.dot /*.svg /*.csv +index.faiss diff --git a/README.md b/README.md index 06534eb9..232d0c3d 100644 --- a/README.md +++ b/README.md @@ -10,12 +10,15 @@ PyPI Version - - PyPI Downloads + + Try Quickstart in Colab - + + License @@ -33,7 +36,7 @@ ### ⚡ The Lightning Library for Large Language Model Applications ⚡ *LightRAG* helps developers build and optimize *Retriever-Agent-Generator* pipelines. -Embracing a design philosophy similar to *PyTorch*, LightRAG is *light*, *modular*, and *robust*, with a 100% readable codebase. +Embracing similar design pattern to *PyTorch*, LightRAG is *light*, *modular*, and *robust*, with a 100% readable codebase. # Why LightRAG? diff --git a/docs/source/get_started/installation.rst b/docs/source/get_started/installation.rst index c1a514e4..2a34cf0d 100644 --- a/docs/source/get_started/installation.rst +++ b/docs/source/get_started/installation.rst @@ -12,7 +12,14 @@ To install the package, run: pip install lightrag +If you know you will need `openai` and `faiss-cpu`, you can do so with: +.. code-block:: bash + + pip install lightrag[openai, faiss] + +.. note:: + Check the `Optional Packages` section for more information on the available packages. 2. Set up API keys ~~~~~~~~~~~~~~~~~~~ @@ -58,21 +65,24 @@ This setup ensures that LightRAG can access all necessary configurations during LightRAG currently has built-in support for (1) OpenAI, Groq, Anthropic, Google, and Cohere, and (2) FAISS and Transformers. -You can find all optional packages at :class:`utils.lazy_import.OptionalPackages`. +You can find all optional packages at :class:`OptionalPackages`. Make sure to install the necessary SDKs for the components you plan to use. Here is the list of our tested versions: .. code-block:: - openai = "^1.12.0" - groq = "^0.5.0" - faiss-cpu = "^1.8.0" - sqlalchemy = "^2.0.30" - cohere = "^5.5.8" - pgvector = "^0.2.5" - anthropic = "^0.26.0" - google-generativeai = "^0.5.4" + openai = "^1.12.0" + groq = "^0.5.0" + faiss-cpu = "^1.8.0" + sqlalchemy = "^2.0.30" + pgvector = "^0.3.1" + torch = "^2.3.1" + anthropic = "^0.31.1" + google-generativeai = "^0.7.2" + cohere = "^5.5.8" + +You can install the optional packages with either `pip install package_name` or `pip install lightrag[package_name]`. diff --git a/lightrag/CHANGELOG.md b/lightrag/CHANGELOG.md index 00076909..c3e2f4f9 100644 --- a/lightrag/CHANGELOG.md +++ b/lightrag/CHANGELOG.md @@ -1,5 +1,14 @@ +## [0.1.0-beta.2] - 2024-07-15 + +### Modified +- Make `LocalDB` a component for better visualization. +- Add extra packages in dependencides. + +## [0.1.0-beta.1] - 2024-07-15 + ### Added - `Sequential` adds `acall` method. +- Add extra packages so that users can install them with `pip install lightrag[extra]`. ## [0.0.0-beta.1] - 2024-07-10 diff --git a/lightrag/PACKAGING.md b/lightrag/PACKAGING.md new file mode 100644 index 00000000..7ecb1dbe --- /dev/null +++ b/lightrag/PACKAGING.md @@ -0,0 +1,29 @@ +#Poetry Packaging Guide +## Development + +To install optional dependencies, use the following command: + +```bash +poetry install --extras "openai groq faiss" +``` +Install more extra after the first installation, you will use the same command: + +```bash +poetry install --extras "anthropic cohere google-generativeai pgvector" +``` + +## Extra Dependencies +Add the optional package in dependencides. + +Build it locally: +```bash +poetry build +``` + +Test the package locally: + +Better to use a colab to update the whl file and test the installation. + +```bash +pip install "dist/lightrag-0.1.0b1-py3-none-any.whl[openai,groq,faiss]" +``` diff --git a/lightrag/README.md b/lightrag/README.md index 57030906..493daf74 100644 --- a/lightrag/README.md +++ b/lightrag/README.md @@ -179,8 +179,8 @@ Use the following code will let us see the prompt after it is formatted: ```python -qa2.generator.print_prompt( - output_format_str=qa2.generator.output_processors.format_instructions(), +qa.generator.print_prompt( + output_format_str=qa.generator.output_processors.format_instructions(), input_str="What is LLM?", ) ``` diff --git a/lightrag/lightrag/components/retriever/faiss_retriever.py b/lightrag/lightrag/components/retriever/faiss_retriever.py index 5aef4119..5531eae7 100644 --- a/lightrag/lightrag/components/retriever/faiss_retriever.py +++ b/lightrag/lightrag/components/retriever/faiss_retriever.py @@ -80,6 +80,15 @@ class FAISSRetriever( We choose cosine similarity and convert it to range [0, 1] by adding 1 and dividing by 2 to simulate probability in [0, 1] + Install FAISS: + + As FAISS is optional package, you can install it with pip for cpu version: + ```bash + pip install faiss-cpu + ``` + For GPU version: + You might have to use conda to install faiss-gpu:https://github.com/facebookresearch/faiss/wiki/Installing-Faiss + References: - FAISS: https://github.com/facebookresearch/faiss """ diff --git a/lightrag/lightrag/core/db.py b/lightrag/lightrag/core/db.py index 0005d60c..721e2620 100644 --- a/lightrag/lightrag/core/db.py +++ b/lightrag/lightrag/core/db.py @@ -19,7 +19,7 @@ @dataclass -class LocalDB(Generic[T]): +class LocalDB(Generic[T], Component): __doc__ = r"""LocalDB with in-memory CRUD operations, data transformation/processing pipelines, and persistence. LocalDB is highly flexible. @@ -110,6 +110,9 @@ class LocalDB(Generic[T]): default_factory=dict, metadata={"description": "Map function setup by key"} ) + def __post_init__(self): + super().__init__() + @property def length(self): return len(self.items) @@ -272,6 +275,7 @@ def add( for key, transformed_docs in transformed_items.items(): self.transformed_items[key].extend(transformed_docs) + # TODO: rename it better to add the condition filter def fetch_items(self, condition: Callable[[T], bool]) -> List[T]: """Fetch items with a condition.""" return [item for item in self.items if condition(item)] diff --git a/lightrag/lightrag/utils/lazy_import.py b/lightrag/lightrag/utils/lazy_import.py index db0d1293..12e413bf 100644 --- a/lightrag/lightrag/utils/lazy_import.py +++ b/lightrag/lightrag/utils/lazy_import.py @@ -21,7 +21,7 @@ class OptionalPackages(Enum): ANTHROPIC = ("anthropic", "Please install anthropic with: pip install anthropic") GOOGLE_GENERATIVEAI = ( "google.generativeai", - "Please install google-generativeai to use GoogleGenAIClient", + "Please install google-generativeai with: pip install google-generativeai", ) TRANSFORMERS = ( "transformers", diff --git a/lightrag/poetry.lock b/lightrag/poetry.lock index 5a453e03..001ba5c6 100644 --- a/lightrag/poetry.lock +++ b/lightrag/poetry.lock @@ -11,6 +11,31 @@ files = [ {file = "annotated_types-0.7.0.tar.gz", hash = "sha256:aff07c09a53a08bc8cfccb9c85b05f1aa9a2a6f23728d790723543408344ce89"}, ] +[[package]] +name = "anthropic" +version = "0.31.1" +description = "The official Python library for the anthropic API" +optional = true +python-versions = ">=3.7" +files = [ + {file = "anthropic-0.31.1-py3-none-any.whl", hash = "sha256:d18809cbdecee2296f418e30beb2d0a8ecc225c065a1494cb02348af48794ff8"}, + {file = "anthropic-0.31.1.tar.gz", hash = "sha256:d2248dfc15f7fc7823ac0bb9d48e73429e9b1ed8327ac66839d00cdb2f29d3cb"}, +] + +[package.dependencies] +anyio = ">=3.5.0,<5" +distro = ">=1.7.0,<2" +httpx = ">=0.23.0,<1" +jiter = ">=0.4.0,<1" +pydantic = ">=1.9.0,<3" +sniffio = "*" +tokenizers = ">=0.13.0" +typing-extensions = ">=4.7,<5" + +[package.extras] +bedrock = ["boto3 (>=1.28.57)", "botocore (>=1.31.57)"] +vertex = ["google-auth (>=2,<3)"] + [[package]] name = "anyio" version = "4.4.0" @@ -63,6 +88,58 @@ files = [ {file = "backoff-2.2.1.tar.gz", hash = "sha256:03f829f5bb1923180821643f8753b0502c3b682293992485b0eef2807afa5cba"}, ] +[[package]] +name = "boto3" +version = "1.34.144" +description = "The AWS SDK for Python" +optional = true +python-versions = ">=3.8" +files = [ + {file = "boto3-1.34.144-py3-none-any.whl", hash = "sha256:b8433d481d50b68a0162c0379c0dd4aabfc3d1ad901800beb5b87815997511c1"}, + {file = "boto3-1.34.144.tar.gz", hash = "sha256:2f3e88b10b8fcc5f6100a9d74cd28230edc9d4fa226d99dd40a3ab38ac213673"}, +] + +[package.dependencies] +botocore = ">=1.34.144,<1.35.0" +jmespath = ">=0.7.1,<2.0.0" +s3transfer = ">=0.10.0,<0.11.0" + +[package.extras] +crt = ["botocore[crt] (>=1.21.0,<2.0a0)"] + +[[package]] +name = "botocore" +version = "1.34.144" +description = "Low-level, data-driven core of boto 3." +optional = true +python-versions = ">=3.8" +files = [ + {file = "botocore-1.34.144-py3-none-any.whl", hash = "sha256:a2cf26e1bf10d5917a2285e50257bc44e94a1d16574f282f3274f7a5d8d1f08b"}, + {file = "botocore-1.34.144.tar.gz", hash = "sha256:4215db28d25309d59c99507f1f77df9089e5bebbad35f6e19c7c44ec5383a3e8"}, +] + +[package.dependencies] +jmespath = ">=0.7.1,<2.0.0" +python-dateutil = ">=2.1,<3.0.0" +urllib3 = [ + {version = ">=1.25.4,<1.27", markers = "python_version < \"3.10\""}, + {version = ">=1.25.4,<2.2.0 || >2.2.0,<3", markers = "python_version >= \"3.10\""}, +] + +[package.extras] +crt = ["awscrt (==0.20.11)"] + +[[package]] +name = "cachetools" +version = "5.4.0" +description = "Extensible memoizing collections and decorators" +optional = true +python-versions = ">=3.7" +files = [ + {file = "cachetools-5.4.0-py3-none-any.whl", hash = "sha256:3ae3b49a3d5e28a77a0be2b37dbcb89005058959cb2323858c2657c4a8cab474"}, + {file = "cachetools-5.4.0.tar.gz", hash = "sha256:b8adc2e7c07f105ced7bc56dbb6dfbe7c4a00acce20e2227b3f355be89bc6827"}, +] + [[package]] name = "certifi" version = "2024.7.4" @@ -184,6 +261,29 @@ files = [ {file = "charset_normalizer-3.3.2-py3-none-any.whl", hash = "sha256:3e4d1f6587322d2788836a99c69062fbb091331ec940e02d12d179c1d53e25fc"}, ] +[[package]] +name = "cohere" +version = "5.5.8" +description = "" +optional = true +python-versions = "<4.0,>=3.8" +files = [ + {file = "cohere-5.5.8-py3-none-any.whl", hash = "sha256:e1ed84b90eadd13c6a68ee28e378a0bb955f8945eadc6eb7ee126b3399cafd54"}, + {file = "cohere-5.5.8.tar.gz", hash = "sha256:84ce7666ff8fbdf4f41fb5f6ca452ab2639a514bc88967a2854a9b1b820d6ea0"}, +] + +[package.dependencies] +boto3 = ">=1.34.0,<2.0.0" +fastavro = ">=1.9.4,<2.0.0" +httpx = ">=0.21.2" +httpx-sse = ">=0.4.0,<0.5.0" +parameterized = ">=0.9.0,<0.10.0" +pydantic = ">=1.9.2" +requests = ">=2.0.0,<3.0.0" +tokenizers = ">=0.15,<1" +types-requests = ">=2.0.0,<3.0.0" +typing_extensions = ">=4.0.0" + [[package]] name = "colorama" version = "0.4.6" @@ -219,13 +319,13 @@ files = [ [[package]] name = "exceptiongroup" -version = "1.2.1" +version = "1.2.2" description = "Backport of PEP 654 (exception groups)" optional = false python-versions = ">=3.7" files = [ - {file = "exceptiongroup-1.2.1-py3-none-any.whl", hash = "sha256:5258b9ed329c5bbdd31a309f53cbfb0b155341807f6ff7606a1e801a891b29ad"}, - {file = "exceptiongroup-1.2.1.tar.gz", hash = "sha256:a4785e48b045528f5bfe627b6ad554ff32def154f42372786903b7abcfe1aa16"}, + {file = "exceptiongroup-1.2.2-py3-none-any.whl", hash = "sha256:3111b9d131c238bec2f8f516e123e14ba243563fb135d3fe885990585aa7795b"}, + {file = "exceptiongroup-1.2.2.tar.gz", hash = "sha256:47c2edf7c6738fafb49fd34290706d1a1a2f4d1c6df275526b62cbb4aa5393cc"}, ] [package.extras] @@ -270,6 +370,52 @@ files = [ numpy = ">=1.0,<2.0" packaging = "*" +[[package]] +name = "fastavro" +version = "1.9.5" +description = "Fast read/write of AVRO files" +optional = true +python-versions = ">=3.8" +files = [ + {file = "fastavro-1.9.5-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:61253148e95dd2b6457247b441b7555074a55de17aef85f5165bfd5facf600fc"}, + {file = "fastavro-1.9.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b604935d671ad47d888efc92a106f98e9440874108b444ac10e28d643109c937"}, + {file = "fastavro-1.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0adbf4956fd53bd74c41e7855bb45ccce953e0eb0e44f5836d8d54ad843f9944"}, + {file = "fastavro-1.9.5-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:53d838e31457db8bf44460c244543f75ed307935d5fc1d93bc631cc7caef2082"}, + {file = "fastavro-1.9.5-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:07b6288e8681eede16ff077632c47395d4925c2f51545cd7a60f194454db2211"}, + {file = "fastavro-1.9.5-cp310-cp310-win_amd64.whl", hash = "sha256:ef08cf247fdfd61286ac0c41854f7194f2ad05088066a756423d7299b688d975"}, + {file = "fastavro-1.9.5-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:c52d7bb69f617c90935a3e56feb2c34d4276819a5c477c466c6c08c224a10409"}, + {file = "fastavro-1.9.5-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:85e05969956003df8fa4491614bc62fe40cec59e94d06e8aaa8d8256ee3aab82"}, + {file = "fastavro-1.9.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:06e6df8527493a9f0d9a8778df82bab8b1aa6d80d1b004e5aec0a31dc4dc501c"}, + {file = "fastavro-1.9.5-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:27820da3b17bc01cebb6d1687c9d7254b16d149ef458871aaa207ed8950f3ae6"}, + {file = "fastavro-1.9.5-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:195a5b8e33eb89a1a9b63fa9dce7a77d41b3b0cd785bac6044df619f120361a2"}, + {file = "fastavro-1.9.5-cp311-cp311-win_amd64.whl", hash = "sha256:be612c109efb727bfd36d4d7ed28eb8e0506617b7dbe746463ebbf81e85eaa6b"}, + {file = "fastavro-1.9.5-cp312-cp312-macosx_10_9_universal2.whl", hash = "sha256:b133456c8975ec7d2a99e16a7e68e896e45c821b852675eac4ee25364b999c14"}, + {file = "fastavro-1.9.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:bf586373c3d1748cac849395aad70c198ee39295f92e7c22c75757b5c0300fbe"}, + {file = "fastavro-1.9.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:724ef192bc9c55d5b4c7df007f56a46a21809463499856349d4580a55e2b914c"}, + {file = "fastavro-1.9.5-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:bfd11fe355a8f9c0416803afac298960eb4c603a23b1c74ff9c1d3e673ea7185"}, + {file = "fastavro-1.9.5-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:9827d1654d7bcb118ef5efd3e5b2c9ab2a48d44dac5e8c6a2327bc3ac3caa828"}, + {file = "fastavro-1.9.5-cp312-cp312-win_amd64.whl", hash = "sha256:d84b69dca296667e6137ae7c9a96d060123adbc0c00532cc47012b64d38b47e9"}, + {file = "fastavro-1.9.5-cp38-cp38-macosx_11_0_universal2.whl", hash = "sha256:fb744e9de40fb1dc75354098c8db7da7636cba50a40f7bef3b3fb20f8d189d88"}, + {file = "fastavro-1.9.5-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:240df8bacd13ff5487f2465604c007d686a566df5cbc01d0550684eaf8ff014a"}, + {file = "fastavro-1.9.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c3bb35c25bbc3904e1c02333bc1ae0173e0a44aa37a8e95d07e681601246e1f1"}, + {file = "fastavro-1.9.5-cp38-cp38-musllinux_1_2_aarch64.whl", hash = "sha256:b47a54a9700de3eabefd36dabfb237808acae47bc873cada6be6990ef6b165aa"}, + {file = "fastavro-1.9.5-cp38-cp38-musllinux_1_2_x86_64.whl", hash = "sha256:48c7b5e6d2f3bf7917af301c275b05c5be3dd40bb04e80979c9e7a2ab31a00d1"}, + {file = "fastavro-1.9.5-cp38-cp38-win_amd64.whl", hash = "sha256:05d13f98d4e325be40387e27da9bd60239968862fe12769258225c62ec906f04"}, + {file = "fastavro-1.9.5-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:5b47948eb196263f6111bf34e1cd08d55529d4ed46eb50c1bc8c7c30a8d18868"}, + {file = "fastavro-1.9.5-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:85b7a66ad521298ad9373dfe1897a6ccfc38feab54a47b97922e213ae5ad8870"}, + {file = "fastavro-1.9.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:44cb154f863ad80e41aea72a709b12e1533b8728c89b9b1348af91a6154ab2f5"}, + {file = "fastavro-1.9.5-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:b5f7f2b1fe21231fd01f1a2a90e714ae267fe633cd7ce930c0aea33d1c9f4901"}, + {file = "fastavro-1.9.5-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:88fbbe16c61d90a89d78baeb5a34dc1c63a27b115adccdbd6b1fb6f787deacf2"}, + {file = "fastavro-1.9.5-cp39-cp39-win_amd64.whl", hash = "sha256:753f5eedeb5ca86004e23a9ce9b41c5f25eb64a876f95edcc33558090a7f3e4b"}, + {file = "fastavro-1.9.5.tar.gz", hash = "sha256:6419ebf45f88132a9945c51fe555d4f10bb97c236288ed01894f957c6f914553"}, +] + +[package.extras] +codecs = ["cramjam", "lz4", "zstandard"] +lz4 = ["lz4"] +snappy = ["cramjam"] +zstandard = ["zstandard"] + [[package]] name = "filelock" version = "3.15.4" @@ -325,11 +471,155 @@ test-downstream = ["aiobotocore (>=2.5.4,<3.0.0)", "dask-expr", "dask[dataframe, test-full = ["adlfs", "aiohttp (!=4.0.0a0,!=4.0.0a1)", "cloudpickle", "dask", "distributed", "dropbox", "dropboxdrivefs", "fastparquet", "fusepy", "gcsfs", "jinja2", "kerchunk", "libarchive-c", "lz4", "notebook", "numpy", "ocifs", "pandas", "panel", "paramiko", "pyarrow", "pyarrow (>=1)", "pyftpdlib", "pygit2", "pytest", "pytest-asyncio (!=0.22.0)", "pytest-benchmark", "pytest-cov", "pytest-mock", "pytest-recording", "pytest-rerunfailures", "python-snappy", "requests", "smbprotocol", "tqdm", "urllib3", "zarr", "zstandard"] tqdm = ["tqdm"] +[[package]] +name = "google-ai-generativelanguage" +version = "0.6.6" +description = "Google Ai Generativelanguage API client library" +optional = true +python-versions = ">=3.7" +files = [ + {file = "google-ai-generativelanguage-0.6.6.tar.gz", hash = "sha256:1739f035caeeeca5c28f887405eec8690f3372daf79fecf26454a97a4f1733a8"}, + {file = "google_ai_generativelanguage-0.6.6-py3-none-any.whl", hash = "sha256:59297737931f073d55ce1268dcc6d95111ee62850349d2b6cde942b16a4fca5c"}, +] + +[package.dependencies] +google-api-core = {version = ">=1.34.1,<2.0.dev0 || >=2.11.dev0,<3.0.0dev", extras = ["grpc"]} +google-auth = ">=2.14.1,<2.24.0 || >2.24.0,<2.25.0 || >2.25.0,<3.0.0dev" +proto-plus = ">=1.22.3,<2.0.0dev" +protobuf = ">=3.19.5,<3.20.0 || >3.20.0,<3.20.1 || >3.20.1,<4.21.0 || >4.21.0,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<5.0.0dev" + +[[package]] +name = "google-api-core" +version = "2.19.1" +description = "Google API client core library" +optional = true +python-versions = ">=3.7" +files = [ + {file = "google-api-core-2.19.1.tar.gz", hash = "sha256:f4695f1e3650b316a795108a76a1c416e6afb036199d1c1f1f110916df479ffd"}, + {file = "google_api_core-2.19.1-py3-none-any.whl", hash = "sha256:f12a9b8309b5e21d92483bbd47ce2c445861ec7d269ef6784ecc0ea8c1fa6125"}, +] + +[package.dependencies] +google-auth = ">=2.14.1,<3.0.dev0" +googleapis-common-protos = ">=1.56.2,<2.0.dev0" +grpcio = [ + {version = ">=1.33.2,<2.0dev", optional = true, markers = "python_version < \"3.11\" and extra == \"grpc\""}, + {version = ">=1.49.1,<2.0dev", optional = true, markers = "python_version >= \"3.11\" and extra == \"grpc\""}, +] +grpcio-status = [ + {version = ">=1.33.2,<2.0.dev0", optional = true, markers = "python_version < \"3.11\" and extra == \"grpc\""}, + {version = ">=1.49.1,<2.0.dev0", optional = true, markers = "python_version >= \"3.11\" and extra == \"grpc\""}, +] +proto-plus = ">=1.22.3,<2.0.0dev" +protobuf = ">=3.19.5,<3.20.0 || >3.20.0,<3.20.1 || >3.20.1,<4.21.0 || >4.21.0,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<6.0.0.dev0" +requests = ">=2.18.0,<3.0.0.dev0" + +[package.extras] +grpc = ["grpcio (>=1.33.2,<2.0dev)", "grpcio (>=1.49.1,<2.0dev)", "grpcio-status (>=1.33.2,<2.0.dev0)", "grpcio-status (>=1.49.1,<2.0.dev0)"] +grpcgcp = ["grpcio-gcp (>=0.2.2,<1.0.dev0)"] +grpcio-gcp = ["grpcio-gcp (>=0.2.2,<1.0.dev0)"] + +[[package]] +name = "google-api-python-client" +version = "2.137.0" +description = "Google API Client Library for Python" +optional = true +python-versions = ">=3.7" +files = [ + {file = "google_api_python_client-2.137.0-py2.py3-none-any.whl", hash = "sha256:a8b5c5724885e5be9f5368739aa0ccf416627da4ebd914b410a090c18f84d692"}, + {file = "google_api_python_client-2.137.0.tar.gz", hash = "sha256:e739cb74aac8258b1886cb853b0722d47c81fe07ad649d7f2206f06530513c04"}, +] + +[package.dependencies] +google-api-core = ">=1.31.5,<2.0.dev0 || >2.3.0,<3.0.0.dev0" +google-auth = ">=1.32.0,<2.24.0 || >2.24.0,<2.25.0 || >2.25.0,<3.0.0.dev0" +google-auth-httplib2 = ">=0.2.0,<1.0.0" +httplib2 = ">=0.19.0,<1.dev0" +uritemplate = ">=3.0.1,<5" + +[[package]] +name = "google-auth" +version = "2.32.0" +description = "Google Authentication Library" +optional = true +python-versions = ">=3.7" +files = [ + {file = "google_auth-2.32.0-py2.py3-none-any.whl", hash = "sha256:53326ea2ebec768070a94bee4e1b9194c9646ea0c2bd72422785bd0f9abfad7b"}, + {file = "google_auth-2.32.0.tar.gz", hash = "sha256:49315be72c55a6a37d62819e3573f6b416aca00721f7e3e31a008d928bf64022"}, +] + +[package.dependencies] +cachetools = ">=2.0.0,<6.0" +pyasn1-modules = ">=0.2.1" +rsa = ">=3.1.4,<5" + +[package.extras] +aiohttp = ["aiohttp (>=3.6.2,<4.0.0.dev0)", "requests (>=2.20.0,<3.0.0.dev0)"] +enterprise-cert = ["cryptography (==36.0.2)", "pyopenssl (==22.0.0)"] +pyopenssl = ["cryptography (>=38.0.3)", "pyopenssl (>=20.0.0)"] +reauth = ["pyu2f (>=0.1.5)"] +requests = ["requests (>=2.20.0,<3.0.0.dev0)"] + +[[package]] +name = "google-auth-httplib2" +version = "0.2.0" +description = "Google Authentication Library: httplib2 transport" +optional = true +python-versions = "*" +files = [ + {file = "google-auth-httplib2-0.2.0.tar.gz", hash = "sha256:38aa7badf48f974f1eb9861794e9c0cb2a0511a4ec0679b1f886d108f5640e05"}, + {file = "google_auth_httplib2-0.2.0-py2.py3-none-any.whl", hash = "sha256:b65a0a2123300dd71281a7bf6e64d65a0759287df52729bdd1ae2e47dc311a3d"}, +] + +[package.dependencies] +google-auth = "*" +httplib2 = ">=0.19.0" + +[[package]] +name = "google-generativeai" +version = "0.7.2" +description = "Google Generative AI High level API client library and tools." +optional = true +python-versions = ">=3.9" +files = [ + {file = "google_generativeai-0.7.2-py3-none-any.whl", hash = "sha256:3117d1ebc92ee77710d4bc25ab4763492fddce9b6332eb25d124cf5d8b78b339"}, +] + +[package.dependencies] +google-ai-generativelanguage = "0.6.6" +google-api-core = "*" +google-api-python-client = "*" +google-auth = ">=2.15.0" +protobuf = "*" +pydantic = "*" +tqdm = "*" +typing-extensions = "*" + +[package.extras] +dev = ["Pillow", "absl-py", "black", "ipython", "nose2", "pandas", "pytype", "pyyaml"] + +[[package]] +name = "googleapis-common-protos" +version = "1.63.2" +description = "Common protobufs used in Google APIs" +optional = true +python-versions = ">=3.7" +files = [ + {file = "googleapis-common-protos-1.63.2.tar.gz", hash = "sha256:27c5abdffc4911f28101e635de1533fb4cfd2c37fbaa9174587c799fac90aa87"}, + {file = "googleapis_common_protos-1.63.2-py2.py3-none-any.whl", hash = "sha256:27a2499c7e8aff199665b22741997e485eccc8645aa9176c7c988e6fae507945"}, +] + +[package.dependencies] +protobuf = ">=3.20.2,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<6.0.0.dev0" + +[package.extras] +grpc = ["grpcio (>=1.44.0,<2.0.0.dev0)"] + [[package]] name = "greenlet" version = "3.0.3" description = "Lightweight in-process concurrent programming" -optional = false +optional = true python-versions = ">=3.7" files = [ {file = "greenlet-3.0.3-cp310-cp310-macosx_11_0_universal2.whl", hash = "sha256:9da2bd29ed9e4f15955dd1595ad7bc9320308a3b766ef7f837e23ad4b4aac31a"}, @@ -400,7 +690,7 @@ test = ["objgraph", "psutil"] name = "groq" version = "0.5.0" description = "The official Python library for the groq API" -optional = false +optional = true python-versions = ">=3.7" files = [ {file = "groq-0.5.0-py3-none-any.whl", hash = "sha256:a7e6be1118bcdfea3ed071ec00f505a34d4e6ec28c435adb5a5afd33545683a1"}, @@ -415,6 +705,80 @@ pydantic = ">=1.9.0,<3" sniffio = "*" typing-extensions = ">=4.7,<5" +[[package]] +name = "grpcio" +version = "1.64.1" +description = "HTTP/2-based RPC framework" +optional = true +python-versions = ">=3.8" +files = [ + {file = "grpcio-1.64.1-cp310-cp310-linux_armv7l.whl", hash = "sha256:55697ecec192bc3f2f3cc13a295ab670f51de29884ca9ae6cd6247df55df2502"}, + {file = "grpcio-1.64.1-cp310-cp310-macosx_12_0_universal2.whl", hash = "sha256:3b64ae304c175671efdaa7ec9ae2cc36996b681eb63ca39c464958396697daff"}, + {file = "grpcio-1.64.1-cp310-cp310-manylinux_2_17_aarch64.whl", hash = "sha256:bac71b4b28bc9af61efcdc7630b166440bbfbaa80940c9a697271b5e1dabbc61"}, + {file = "grpcio-1.64.1-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6c024ffc22d6dc59000faf8ad781696d81e8e38f4078cb0f2630b4a3cf231a90"}, + {file = "grpcio-1.64.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e7cd5c1325f6808b8ae31657d281aadb2a51ac11ab081ae335f4f7fc44c1721d"}, + {file = "grpcio-1.64.1-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:0a2813093ddb27418a4c99f9b1c223fab0b053157176a64cc9db0f4557b69bd9"}, + {file = "grpcio-1.64.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:2981c7365a9353f9b5c864595c510c983251b1ab403e05b1ccc70a3d9541a73b"}, + {file = "grpcio-1.64.1-cp310-cp310-win32.whl", hash = "sha256:1262402af5a511c245c3ae918167eca57342c72320dffae5d9b51840c4b2f86d"}, + {file = "grpcio-1.64.1-cp310-cp310-win_amd64.whl", hash = "sha256:19264fc964576ddb065368cae953f8d0514ecc6cb3da8903766d9fb9d4554c33"}, + {file = "grpcio-1.64.1-cp311-cp311-linux_armv7l.whl", hash = "sha256:58b1041e7c870bb30ee41d3090cbd6f0851f30ae4eb68228955d973d3efa2e61"}, + {file = "grpcio-1.64.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:bbc5b1d78a7822b0a84c6f8917faa986c1a744e65d762ef6d8be9d75677af2ca"}, + {file = "grpcio-1.64.1-cp311-cp311-manylinux_2_17_aarch64.whl", hash = "sha256:5841dd1f284bd1b3d8a6eca3a7f062b06f1eec09b184397e1d1d43447e89a7ae"}, + {file = "grpcio-1.64.1-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:8caee47e970b92b3dd948371230fcceb80d3f2277b3bf7fbd7c0564e7d39068e"}, + {file = "grpcio-1.64.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:73819689c169417a4f978e562d24f2def2be75739c4bed1992435d007819da1b"}, + {file = "grpcio-1.64.1-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:6503b64c8b2dfad299749cad1b595c650c91e5b2c8a1b775380fcf8d2cbba1e9"}, + {file = "grpcio-1.64.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:1de403fc1305fd96cfa75e83be3dee8538f2413a6b1685b8452301c7ba33c294"}, + {file = "grpcio-1.64.1-cp311-cp311-win32.whl", hash = "sha256:d4d29cc612e1332237877dfa7fe687157973aab1d63bd0f84cf06692f04c0367"}, + {file = "grpcio-1.64.1-cp311-cp311-win_amd64.whl", hash = "sha256:5e56462b05a6f860b72f0fa50dca06d5b26543a4e88d0396259a07dc30f4e5aa"}, + {file = "grpcio-1.64.1-cp312-cp312-linux_armv7l.whl", hash = "sha256:4657d24c8063e6095f850b68f2d1ba3b39f2b287a38242dcabc166453e950c59"}, + {file = "grpcio-1.64.1-cp312-cp312-macosx_10_9_universal2.whl", hash = "sha256:62b4e6eb7bf901719fce0ca83e3ed474ae5022bb3827b0a501e056458c51c0a1"}, + {file = "grpcio-1.64.1-cp312-cp312-manylinux_2_17_aarch64.whl", hash = "sha256:ee73a2f5ca4ba44fa33b4d7d2c71e2c8a9e9f78d53f6507ad68e7d2ad5f64a22"}, + {file = "grpcio-1.64.1-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:198908f9b22e2672a998870355e226a725aeab327ac4e6ff3a1399792ece4762"}, + {file = "grpcio-1.64.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:39b9d0acaa8d835a6566c640f48b50054f422d03e77e49716d4c4e8e279665a1"}, + {file = "grpcio-1.64.1-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:5e42634a989c3aa6049f132266faf6b949ec2a6f7d302dbb5c15395b77d757eb"}, + {file = "grpcio-1.64.1-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:b1a82e0b9b3022799c336e1fc0f6210adc019ae84efb7321d668129d28ee1efb"}, + {file = "grpcio-1.64.1-cp312-cp312-win32.whl", hash = "sha256:55260032b95c49bee69a423c2f5365baa9369d2f7d233e933564d8a47b893027"}, + {file = "grpcio-1.64.1-cp312-cp312-win_amd64.whl", hash = "sha256:c1a786ac592b47573a5bb7e35665c08064a5d77ab88a076eec11f8ae86b3e3f6"}, + {file = "grpcio-1.64.1-cp38-cp38-linux_armv7l.whl", hash = "sha256:a011ac6c03cfe162ff2b727bcb530567826cec85eb8d4ad2bfb4bd023287a52d"}, + {file = "grpcio-1.64.1-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:4d6dab6124225496010bd22690f2d9bd35c7cbb267b3f14e7a3eb05c911325d4"}, + {file = "grpcio-1.64.1-cp38-cp38-manylinux_2_17_aarch64.whl", hash = "sha256:a5e771d0252e871ce194d0fdcafd13971f1aae0ddacc5f25615030d5df55c3a2"}, + {file = "grpcio-1.64.1-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:2c3c1b90ab93fed424e454e93c0ed0b9d552bdf1b0929712b094f5ecfe7a23ad"}, + {file = "grpcio-1.64.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:20405cb8b13fd779135df23fabadc53b86522d0f1cba8cca0e87968587f50650"}, + {file = "grpcio-1.64.1-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:0cc79c982ccb2feec8aad0e8fb0d168bcbca85bc77b080d0d3c5f2f15c24ea8f"}, + {file = "grpcio-1.64.1-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:a3a035c37ce7565b8f4f35ff683a4db34d24e53dc487e47438e434eb3f701b2a"}, + {file = "grpcio-1.64.1-cp38-cp38-win32.whl", hash = "sha256:1257b76748612aca0f89beec7fa0615727fd6f2a1ad580a9638816a4b2eb18fd"}, + {file = "grpcio-1.64.1-cp38-cp38-win_amd64.whl", hash = "sha256:0a12ddb1678ebc6a84ec6b0487feac020ee2b1659cbe69b80f06dbffdb249122"}, + {file = "grpcio-1.64.1-cp39-cp39-linux_armv7l.whl", hash = "sha256:75dbbf415026d2862192fe1b28d71f209e2fd87079d98470db90bebe57b33179"}, + {file = "grpcio-1.64.1-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:e3d9f8d1221baa0ced7ec7322a981e28deb23749c76eeeb3d33e18b72935ab62"}, + {file = "grpcio-1.64.1-cp39-cp39-manylinux_2_17_aarch64.whl", hash = "sha256:5f8b75f64d5d324c565b263c67dbe4f0af595635bbdd93bb1a88189fc62ed2e5"}, + {file = "grpcio-1.64.1-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c84ad903d0d94311a2b7eea608da163dace97c5fe9412ea311e72c3684925602"}, + {file = "grpcio-1.64.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:940e3ec884520155f68a3b712d045e077d61c520a195d1a5932c531f11883489"}, + {file = "grpcio-1.64.1-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:f10193c69fc9d3d726e83bbf0f3d316f1847c3071c8c93d8090cf5f326b14309"}, + {file = "grpcio-1.64.1-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:ac15b6c2c80a4d1338b04d42a02d376a53395ddf0ec9ab157cbaf44191f3ffdd"}, + {file = "grpcio-1.64.1-cp39-cp39-win32.whl", hash = "sha256:03b43d0ccf99c557ec671c7dede64f023c7da9bb632ac65dbc57f166e4970040"}, + {file = "grpcio-1.64.1-cp39-cp39-win_amd64.whl", hash = "sha256:ed6091fa0adcc7e4ff944090cf203a52da35c37a130efa564ded02b7aff63bcd"}, + {file = "grpcio-1.64.1.tar.gz", hash = "sha256:8d51dd1c59d5fa0f34266b80a3805ec29a1f26425c2a54736133f6d87fc4968a"}, +] + +[package.extras] +protobuf = ["grpcio-tools (>=1.64.1)"] + +[[package]] +name = "grpcio-status" +version = "1.62.2" +description = "Status proto mapping for gRPC" +optional = true +python-versions = ">=3.6" +files = [ + {file = "grpcio-status-1.62.2.tar.gz", hash = "sha256:62e1bfcb02025a1cd73732a2d33672d3e9d0df4d21c12c51e0bbcaf09bab742a"}, + {file = "grpcio_status-1.62.2-py3-none-any.whl", hash = "sha256:206ddf0eb36bc99b033f03b2c8e95d319f0044defae9b41ae21408e7e0cda48f"}, +] + +[package.dependencies] +googleapis-common-protos = ">=1.5.5" +grpcio = ">=1.62.2" +protobuf = ">=4.21.6" + [[package]] name = "h11" version = "0.14.0" @@ -447,6 +811,20 @@ http2 = ["h2 (>=3,<5)"] socks = ["socksio (==1.*)"] trio = ["trio (>=0.22.0,<0.26.0)"] +[[package]] +name = "httplib2" +version = "0.22.0" +description = "A comprehensive HTTP client library." +optional = true +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" +files = [ + {file = "httplib2-0.22.0-py3-none-any.whl", hash = "sha256:14ae0a53c1ba8f3d37e9e27cf37eabb0fb9980f435ba405d546948b009dd64dc"}, + {file = "httplib2-0.22.0.tar.gz", hash = "sha256:d7a10bc5ef5ab08322488bde8c726eeee5c8618723fdb399597ec58f3d82df81"}, +] + +[package.dependencies] +pyparsing = {version = ">=2.4.2,<3.0.0 || >3.0.0,<3.0.1 || >3.0.1,<3.0.2 || >3.0.2,<3.0.3 || >3.0.3,<4", markers = "python_version > \"3.0\""} + [[package]] name = "httpx" version = "0.27.0" @@ -471,11 +849,22 @@ cli = ["click (==8.*)", "pygments (==2.*)", "rich (>=10,<14)"] http2 = ["h2 (>=3,<5)"] socks = ["socksio (==1.*)"] +[[package]] +name = "httpx-sse" +version = "0.4.0" +description = "Consume Server-Sent Event (SSE) messages with HTTPX." +optional = true +python-versions = ">=3.8" +files = [ + {file = "httpx-sse-0.4.0.tar.gz", hash = "sha256:1e81a3a3070ce322add1d3529ed42eb5f70817f45ed6ec915ab753f961139721"}, + {file = "httpx_sse-0.4.0-py3-none-any.whl", hash = "sha256:f329af6eae57eaa2bdfd962b42524764af68075ea87370a2de920af5341e318f"}, +] + [[package]] name = "huggingface-hub" version = "0.23.4" description = "Client library to download and publish models, datasets and other repos on the huggingface.co hub" -optional = false +optional = true python-versions = ">=3.8.0" files = [ {file = "huggingface_hub-0.23.4-py3-none-any.whl", hash = "sha256:3a0b957aa87150addf0cc7bd71b4d954b78e749850e1e7fb29ebbd2db64ca037"}, @@ -572,6 +961,87 @@ MarkupSafe = ">=2.0" [package.extras] i18n = ["Babel (>=2.7)"] +[[package]] +name = "jiter" +version = "0.5.0" +description = "Fast iterable JSON parser." +optional = true +python-versions = ">=3.8" +files = [ + {file = "jiter-0.5.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:b599f4e89b3def9a94091e6ee52e1d7ad7bc33e238ebb9c4c63f211d74822c3f"}, + {file = "jiter-0.5.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:2a063f71c4b06225543dddadbe09d203dc0c95ba352d8b85f1221173480a71d5"}, + {file = "jiter-0.5.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:acc0d5b8b3dd12e91dd184b87273f864b363dfabc90ef29a1092d269f18c7e28"}, + {file = "jiter-0.5.0-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:c22541f0b672f4d741382a97c65609332a783501551445ab2df137ada01e019e"}, + {file = "jiter-0.5.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:63314832e302cc10d8dfbda0333a384bf4bcfce80d65fe99b0f3c0da8945a91a"}, + {file = "jiter-0.5.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:a25fbd8a5a58061e433d6fae6d5298777c0814a8bcefa1e5ecfff20c594bd749"}, + {file = "jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:503b2c27d87dfff5ab717a8200fbbcf4714516c9d85558048b1fc14d2de7d8dc"}, + {file = "jiter-0.5.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:6d1f3d27cce923713933a844872d213d244e09b53ec99b7a7fdf73d543529d6d"}, + {file = "jiter-0.5.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:c95980207b3998f2c3b3098f357994d3fd7661121f30669ca7cb945f09510a87"}, + {file = "jiter-0.5.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:afa66939d834b0ce063f57d9895e8036ffc41c4bd90e4a99631e5f261d9b518e"}, + {file = "jiter-0.5.0-cp310-none-win32.whl", hash = "sha256:f16ca8f10e62f25fd81d5310e852df6649af17824146ca74647a018424ddeccf"}, + {file = "jiter-0.5.0-cp310-none-win_amd64.whl", hash = "sha256:b2950e4798e82dd9176935ef6a55cf6a448b5c71515a556da3f6b811a7844f1e"}, + {file = "jiter-0.5.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:d4c8e1ed0ef31ad29cae5ea16b9e41529eb50a7fba70600008e9f8de6376d553"}, + {file = "jiter-0.5.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:c6f16e21276074a12d8421692515b3fd6d2ea9c94fd0734c39a12960a20e85f3"}, + {file = "jiter-0.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5280e68e7740c8c128d3ae5ab63335ce6d1fb6603d3b809637b11713487af9e6"}, + {file = "jiter-0.5.0-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:583c57fc30cc1fec360e66323aadd7fc3edeec01289bfafc35d3b9dcb29495e4"}, + {file = "jiter-0.5.0-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:26351cc14507bdf466b5f99aba3df3143a59da75799bf64a53a3ad3155ecded9"}, + {file = "jiter-0.5.0-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4829df14d656b3fb87e50ae8b48253a8851c707da9f30d45aacab2aa2ba2d614"}, + {file = "jiter-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a42a4bdcf7307b86cb863b2fb9bb55029b422d8f86276a50487982d99eed7c6e"}, + {file = "jiter-0.5.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:04d461ad0aebf696f8da13c99bc1b3e06f66ecf6cfd56254cc402f6385231c06"}, + {file = "jiter-0.5.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:e6375923c5f19888c9226582a124b77b622f8fd0018b843c45eeb19d9701c403"}, + {file = "jiter-0.5.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:2cec323a853c24fd0472517113768c92ae0be8f8c384ef4441d3632da8baa646"}, + {file = "jiter-0.5.0-cp311-none-win32.whl", hash = "sha256:aa1db0967130b5cab63dfe4d6ff547c88b2a394c3410db64744d491df7f069bb"}, + {file = "jiter-0.5.0-cp311-none-win_amd64.whl", hash = "sha256:aa9d2b85b2ed7dc7697597dcfaac66e63c1b3028652f751c81c65a9f220899ae"}, + {file = "jiter-0.5.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:9f664e7351604f91dcdd557603c57fc0d551bc65cc0a732fdacbf73ad335049a"}, + {file = "jiter-0.5.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:044f2f1148b5248ad2c8c3afb43430dccf676c5a5834d2f5089a4e6c5bbd64df"}, + {file = "jiter-0.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:702e3520384c88b6e270c55c772d4bd6d7b150608dcc94dea87ceba1b6391248"}, + {file = "jiter-0.5.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:528d742dcde73fad9d63e8242c036ab4a84389a56e04efd854062b660f559544"}, + {file = "jiter-0.5.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8cf80e5fe6ab582c82f0c3331df27a7e1565e2dcf06265afd5173d809cdbf9ba"}, + {file = "jiter-0.5.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:44dfc9ddfb9b51a5626568ef4e55ada462b7328996294fe4d36de02fce42721f"}, + {file = "jiter-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c451f7922992751a936b96c5f5b9bb9312243d9b754c34b33d0cb72c84669f4e"}, + {file = "jiter-0.5.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:308fce789a2f093dca1ff91ac391f11a9f99c35369117ad5a5c6c4903e1b3e3a"}, + {file = "jiter-0.5.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:7f5ad4a7c6b0d90776fdefa294f662e8a86871e601309643de30bf94bb93a64e"}, + {file = "jiter-0.5.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:ea189db75f8eca08807d02ae27929e890c7d47599ce3d0a6a5d41f2419ecf338"}, + {file = "jiter-0.5.0-cp312-none-win32.whl", hash = "sha256:e3bbe3910c724b877846186c25fe3c802e105a2c1fc2b57d6688b9f8772026e4"}, + {file = "jiter-0.5.0-cp312-none-win_amd64.whl", hash = "sha256:a586832f70c3f1481732919215f36d41c59ca080fa27a65cf23d9490e75b2ef5"}, + {file = "jiter-0.5.0-cp38-cp38-macosx_10_12_x86_64.whl", hash = "sha256:f04bc2fc50dc77be9d10f73fcc4e39346402ffe21726ff41028f36e179b587e6"}, + {file = "jiter-0.5.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:6f433a4169ad22fcb550b11179bb2b4fd405de9b982601914ef448390b2954f3"}, + {file = "jiter-0.5.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ad4a6398c85d3a20067e6c69890ca01f68659da94d74c800298581724e426c7e"}, + {file = "jiter-0.5.0-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:6baa88334e7af3f4d7a5c66c3a63808e5efbc3698a1c57626541ddd22f8e4fbf"}, + {file = "jiter-0.5.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1ece0a115c05efca597c6d938f88c9357c843f8c245dbbb53361a1c01afd7148"}, + {file = "jiter-0.5.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:335942557162ad372cc367ffaf93217117401bf930483b4b3ebdb1223dbddfa7"}, + {file = "jiter-0.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:649b0ee97a6e6da174bffcb3c8c051a5935d7d4f2f52ea1583b5b3e7822fbf14"}, + {file = "jiter-0.5.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:f4be354c5de82157886ca7f5925dbda369b77344b4b4adf2723079715f823989"}, + {file = "jiter-0.5.0-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:5206144578831a6de278a38896864ded4ed96af66e1e63ec5dd7f4a1fce38a3a"}, + {file = "jiter-0.5.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:8120c60f8121ac3d6f072b97ef0e71770cc72b3c23084c72c4189428b1b1d3b6"}, + {file = "jiter-0.5.0-cp38-none-win32.whl", hash = "sha256:6f1223f88b6d76b519cb033a4d3687ca157c272ec5d6015c322fc5b3074d8a5e"}, + {file = "jiter-0.5.0-cp38-none-win_amd64.whl", hash = "sha256:c59614b225d9f434ea8fc0d0bec51ef5fa8c83679afedc0433905994fb36d631"}, + {file = "jiter-0.5.0-cp39-cp39-macosx_10_12_x86_64.whl", hash = "sha256:0af3838cfb7e6afee3f00dc66fa24695199e20ba87df26e942820345b0afc566"}, + {file = "jiter-0.5.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:550b11d669600dbc342364fd4adbe987f14d0bbedaf06feb1b983383dcc4b961"}, + {file = "jiter-0.5.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:489875bf1a0ffb3cb38a727b01e6673f0f2e395b2aad3c9387f94187cb214bbf"}, + {file = "jiter-0.5.0-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:b250ca2594f5599ca82ba7e68785a669b352156260c5362ea1b4e04a0f3e2389"}, + {file = "jiter-0.5.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8ea18e01f785c6667ca15407cd6dabbe029d77474d53595a189bdc813347218e"}, + {file = "jiter-0.5.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:462a52be85b53cd9bffd94e2d788a09984274fe6cebb893d6287e1c296d50653"}, + {file = "jiter-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:92cc68b48d50fa472c79c93965e19bd48f40f207cb557a8346daa020d6ba973b"}, + {file = "jiter-0.5.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1c834133e59a8521bc87ebcad773608c6fa6ab5c7a022df24a45030826cf10bc"}, + {file = "jiter-0.5.0-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:ab3a71ff31cf2d45cb216dc37af522d335211f3a972d2fe14ea99073de6cb104"}, + {file = "jiter-0.5.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:cccd3af9c48ac500c95e1bcbc498020c87e1781ff0345dd371462d67b76643eb"}, + {file = "jiter-0.5.0-cp39-none-win32.whl", hash = "sha256:368084d8d5c4fc40ff7c3cc513c4f73e02c85f6009217922d0823a48ee7adf61"}, + {file = "jiter-0.5.0-cp39-none-win_amd64.whl", hash = "sha256:ce03f7b4129eb72f1687fa11300fbf677b02990618428934662406d2a76742a1"}, + {file = "jiter-0.5.0.tar.gz", hash = "sha256:1d916ba875bcab5c5f7d927df998c4cb694d27dceddf3392e58beaf10563368a"}, +] + +[[package]] +name = "jmespath" +version = "1.0.1" +description = "JSON Matching Expressions" +optional = true +python-versions = ">=3.7" +files = [ + {file = "jmespath-1.0.1-py3-none-any.whl", hash = "sha256:02e2e4cc71b5bcab88332eebf907519190dd9e6e82107fa7f83b1003a6252980"}, + {file = "jmespath-1.0.1.tar.gz", hash = "sha256:90261b206d6defd58fdd5e85f478bf633a2901798906be2ad389150c5c60edbe"}, +] + [[package]] name = "jsonlines" version = "4.0.0" @@ -998,6 +1468,33 @@ files = [ {file = "packaging-24.1.tar.gz", hash = "sha256:026ed72c8ed3fcce5bf8950572258698927fd1dbda10a5e981cdf0ac37f4f002"}, ] +[[package]] +name = "parameterized" +version = "0.9.0" +description = "Parameterized testing with any Python test framework" +optional = true +python-versions = ">=3.7" +files = [ + {file = "parameterized-0.9.0-py2.py3-none-any.whl", hash = "sha256:4e0758e3d41bea3bbd05ec14fc2c24736723f243b28d702081aef438c9372b1b"}, + {file = "parameterized-0.9.0.tar.gz", hash = "sha256:7fc905272cefa4f364c1a3429cbbe9c0f98b793988efb5bf90aac80f08db09b1"}, +] + +[package.extras] +dev = ["jinja2"] + +[[package]] +name = "pgvector" +version = "0.3.1" +description = "pgvector support for Python" +optional = true +python-versions = ">=3.8" +files = [ + {file = "pgvector-0.3.1-py2.py3-none-any.whl", hash = "sha256:7da0629915083a9769b9a73481efb4cdc9122cfd35fc7a9248ce43d177a9c8e8"}, +] + +[package.dependencies] +numpy = "*" + [[package]] name = "platformdirs" version = "4.2.2" @@ -1047,6 +1544,68 @@ nodeenv = ">=0.11.1" pyyaml = ">=5.1" virtualenv = ">=20.10.0" +[[package]] +name = "proto-plus" +version = "1.24.0" +description = "Beautiful, Pythonic protocol buffers." +optional = true +python-versions = ">=3.7" +files = [ + {file = "proto-plus-1.24.0.tar.gz", hash = "sha256:30b72a5ecafe4406b0d339db35b56c4059064e69227b8c3bda7462397f966445"}, + {file = "proto_plus-1.24.0-py3-none-any.whl", hash = "sha256:402576830425e5f6ce4c2a6702400ac79897dab0b4343821aa5188b0fab81a12"}, +] + +[package.dependencies] +protobuf = ">=3.19.0,<6.0.0dev" + +[package.extras] +testing = ["google-api-core (>=1.31.5)"] + +[[package]] +name = "protobuf" +version = "4.25.3" +description = "" +optional = true +python-versions = ">=3.8" +files = [ + {file = "protobuf-4.25.3-cp310-abi3-win32.whl", hash = "sha256:d4198877797a83cbfe9bffa3803602bbe1625dc30d8a097365dbc762e5790faa"}, + {file = "protobuf-4.25.3-cp310-abi3-win_amd64.whl", hash = "sha256:209ba4cc916bab46f64e56b85b090607a676f66b473e6b762e6f1d9d591eb2e8"}, + {file = "protobuf-4.25.3-cp37-abi3-macosx_10_9_universal2.whl", hash = "sha256:f1279ab38ecbfae7e456a108c5c0681e4956d5b1090027c1de0f934dfdb4b35c"}, + {file = "protobuf-4.25.3-cp37-abi3-manylinux2014_aarch64.whl", hash = "sha256:e7cb0ae90dd83727f0c0718634ed56837bfeeee29a5f82a7514c03ee1364c019"}, + {file = "protobuf-4.25.3-cp37-abi3-manylinux2014_x86_64.whl", hash = "sha256:7c8daa26095f82482307bc717364e7c13f4f1c99659be82890dcfc215194554d"}, + {file = "protobuf-4.25.3-cp38-cp38-win32.whl", hash = "sha256:f4f118245c4a087776e0a8408be33cf09f6c547442c00395fbfb116fac2f8ac2"}, + {file = "protobuf-4.25.3-cp38-cp38-win_amd64.whl", hash = "sha256:c053062984e61144385022e53678fbded7aea14ebb3e0305ae3592fb219ccfa4"}, + {file = "protobuf-4.25.3-cp39-cp39-win32.whl", hash = "sha256:19b270aeaa0099f16d3ca02628546b8baefe2955bbe23224aaf856134eccf1e4"}, + {file = "protobuf-4.25.3-cp39-cp39-win_amd64.whl", hash = "sha256:e3c97a1555fd6388f857770ff8b9703083de6bf1f9274a002a332d65fbb56c8c"}, + {file = "protobuf-4.25.3-py3-none-any.whl", hash = "sha256:f0700d54bcf45424477e46a9f0944155b46fb0639d69728739c0e47bab83f2b9"}, + {file = "protobuf-4.25.3.tar.gz", hash = "sha256:25b5d0b42fd000320bd7830b349e3b696435f3b329810427a6bcce6a5492cc5c"}, +] + +[[package]] +name = "pyasn1" +version = "0.6.0" +description = "Pure-Python implementation of ASN.1 types and DER/BER/CER codecs (X.208)" +optional = true +python-versions = ">=3.8" +files = [ + {file = "pyasn1-0.6.0-py2.py3-none-any.whl", hash = "sha256:cca4bb0f2df5504f02f6f8a775b6e416ff9b0b3b16f7ee80b5a3153d9b804473"}, + {file = "pyasn1-0.6.0.tar.gz", hash = "sha256:3a35ab2c4b5ef98e17dfdec8ab074046fbda76e281c5a706ccd82328cfc8f64c"}, +] + +[[package]] +name = "pyasn1-modules" +version = "0.4.0" +description = "A collection of ASN.1-based protocols modules" +optional = true +python-versions = ">=3.8" +files = [ + {file = "pyasn1_modules-0.4.0-py3-none-any.whl", hash = "sha256:be04f15b66c206eed667e0bb5ab27e2b1855ea54a842e5037738099e8ca4ae0b"}, + {file = "pyasn1_modules-0.4.0.tar.gz", hash = "sha256:831dbcea1b177b28c9baddf4c6d1013c24c3accd14a1873fffaa6a2e905f17b6"}, +] + +[package.dependencies] +pyasn1 = ">=0.4.6,<0.7.0" + [[package]] name = "pydantic" version = "2.8.2" @@ -1062,8 +1621,8 @@ files = [ annotated-types = ">=0.4.0" pydantic-core = "2.20.1" typing-extensions = [ - {version = ">=4.12.2", markers = "python_version >= \"3.13\""}, {version = ">=4.6.1", markers = "python_version < \"3.13\""}, + {version = ">=4.12.2", markers = "python_version >= \"3.13\""}, ] [package.extras] @@ -1170,6 +1729,20 @@ files = [ [package.dependencies] typing-extensions = ">=4.6.0,<4.7.0 || >4.7.0" +[[package]] +name = "pyparsing" +version = "3.1.2" +description = "pyparsing module - Classes and methods to define and execute parsing grammars" +optional = true +python-versions = ">=3.6.8" +files = [ + {file = "pyparsing-3.1.2-py3-none-any.whl", hash = "sha256:f9db75911801ed778fe61bb643079ff86601aca99fcae6345aa67292038fb742"}, + {file = "pyparsing-3.1.2.tar.gz", hash = "sha256:a1bac0ce561155ecc3ed78ca94d3c9378656ad4c94c1270de543f621420f94ad"}, +] + +[package.extras] +diagrams = ["jinja2", "railroad-diagrams"] + [[package]] name = "pytest" version = "8.2.2" @@ -1209,6 +1782,20 @@ pytest = ">=6.2.5" [package.extras] dev = ["pre-commit", "pytest-asyncio", "tox"] +[[package]] +name = "python-dateutil" +version = "2.9.0.post0" +description = "Extensions to the standard Python datetime module" +optional = true +python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,>=2.7" +files = [ + {file = "python-dateutil-2.9.0.post0.tar.gz", hash = "sha256:37dd54208da7e1cd875388217d5e00ebd4179249f90fb72437e91a35459a0ad3"}, + {file = "python_dateutil-2.9.0.post0-py2.py3-none-any.whl", hash = "sha256:a8b2bc7bffae282281c8140a97d3aa9c14da0b136dfe83f850eea9a5f7470427"}, +] + +[package.dependencies] +six = ">=1.5" + [[package]] name = "python-dotenv" version = "1.0.1" @@ -1393,126 +1980,46 @@ socks = ["PySocks (>=1.5.6,!=1.5.7)"] use-chardet-on-py3 = ["chardet (>=3.0.2,<6)"] [[package]] -name = "safetensors" -version = "0.4.3" -description = "" -optional = false -python-versions = ">=3.7" +name = "rsa" +version = "4.9" +description = "Pure-Python RSA implementation" +optional = true +python-versions = ">=3.6,<4" files = [ - {file = "safetensors-0.4.3-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:dcf5705cab159ce0130cd56057f5f3425023c407e170bca60b4868048bae64fd"}, - {file = "safetensors-0.4.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:bb4f8c5d0358a31e9a08daeebb68f5e161cdd4018855426d3f0c23bb51087055"}, - {file = "safetensors-0.4.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:70a5319ef409e7f88686a46607cbc3c428271069d8b770076feaf913664a07ac"}, - {file = "safetensors-0.4.3-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:fb9c65bd82f9ef3ce4970dc19ee86be5f6f93d032159acf35e663c6bea02b237"}, - {file = "safetensors-0.4.3-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:edb5698a7bc282089f64c96c477846950358a46ede85a1c040e0230344fdde10"}, - {file = "safetensors-0.4.3-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:efcc860be094b8d19ac61b452ec635c7acb9afa77beb218b1d7784c6d41fe8ad"}, - {file = "safetensors-0.4.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d88b33980222085dd6001ae2cad87c6068e0991d4f5ccf44975d216db3b57376"}, - {file = "safetensors-0.4.3-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:5fc6775529fb9f0ce2266edd3e5d3f10aab068e49f765e11f6f2a63b5367021d"}, - {file = "safetensors-0.4.3-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:9c6ad011c1b4e3acff058d6b090f1da8e55a332fbf84695cf3100c649cc452d1"}, - {file = "safetensors-0.4.3-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:8c496c5401c1b9c46d41a7688e8ff5b0310a3b9bae31ce0f0ae870e1ea2b8caf"}, - {file = "safetensors-0.4.3-cp310-none-win32.whl", hash = "sha256:38e2a8666178224a51cca61d3cb4c88704f696eac8f72a49a598a93bbd8a4af9"}, - {file = "safetensors-0.4.3-cp310-none-win_amd64.whl", hash = "sha256:393e6e391467d1b2b829c77e47d726f3b9b93630e6a045b1d1fca67dc78bf632"}, - {file = "safetensors-0.4.3-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:22f3b5d65e440cec0de8edaa672efa888030802e11c09b3d6203bff60ebff05a"}, - {file = "safetensors-0.4.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7c4fa560ebd4522adddb71dcd25d09bf211b5634003f015a4b815b7647d62ebe"}, - {file = "safetensors-0.4.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e9afd5358719f1b2cf425fad638fc3c887997d6782da317096877e5b15b2ce93"}, - {file = "safetensors-0.4.3-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:d8c5093206ef4b198600ae484230402af6713dab1bd5b8e231905d754022bec7"}, - {file = "safetensors-0.4.3-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:e0b2104df1579d6ba9052c0ae0e3137c9698b2d85b0645507e6fd1813b70931a"}, - {file = "safetensors-0.4.3-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8cf18888606dad030455d18f6c381720e57fc6a4170ee1966adb7ebc98d4d6a3"}, - {file = "safetensors-0.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0bf4f9d6323d9f86eef5567eabd88f070691cf031d4c0df27a40d3b4aaee755b"}, - {file = "safetensors-0.4.3-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:585c9ae13a205807b63bef8a37994f30c917ff800ab8a1ca9c9b5d73024f97ee"}, - {file = "safetensors-0.4.3-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:faefeb3b81bdfb4e5a55b9bbdf3d8d8753f65506e1d67d03f5c851a6c87150e9"}, - {file = "safetensors-0.4.3-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:befdf0167ad626f22f6aac6163477fcefa342224a22f11fdd05abb3995c1783c"}, - {file = "safetensors-0.4.3-cp311-none-win32.whl", hash = "sha256:a7cef55929dcbef24af3eb40bedec35d82c3c2fa46338bb13ecf3c5720af8a61"}, - {file = "safetensors-0.4.3-cp311-none-win_amd64.whl", hash = "sha256:840b7ac0eff5633e1d053cc9db12fdf56b566e9403b4950b2dc85393d9b88d67"}, - {file = "safetensors-0.4.3-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:22d21760dc6ebae42e9c058d75aa9907d9f35e38f896e3c69ba0e7b213033856"}, - {file = "safetensors-0.4.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:8d22c1a10dff3f64d0d68abb8298a3fd88ccff79f408a3e15b3e7f637ef5c980"}, - {file = "safetensors-0.4.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b1648568667f820b8c48317c7006221dc40aced1869908c187f493838a1362bc"}, - {file = "safetensors-0.4.3-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:446e9fe52c051aeab12aac63d1017e0f68a02a92a027b901c4f8e931b24e5397"}, - {file = "safetensors-0.4.3-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:fef5d70683643618244a4f5221053567ca3e77c2531e42ad48ae05fae909f542"}, - {file = "safetensors-0.4.3-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:2a1f4430cc0c9d6afa01214a4b3919d0a029637df8e09675ceef1ca3f0dfa0df"}, - {file = "safetensors-0.4.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2d603846a8585b9432a0fd415db1d4c57c0f860eb4aea21f92559ff9902bae4d"}, - {file = "safetensors-0.4.3-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:a844cdb5d7cbc22f5f16c7e2a0271170750763c4db08381b7f696dbd2c78a361"}, - {file = "safetensors-0.4.3-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:88887f69f7a00cf02b954cdc3034ffb383b2303bc0ab481d4716e2da51ddc10e"}, - {file = "safetensors-0.4.3-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:ee463219d9ec6c2be1d331ab13a8e0cd50d2f32240a81d498266d77d07b7e71e"}, - {file = "safetensors-0.4.3-cp312-none-win32.whl", hash = "sha256:d0dd4a1db09db2dba0f94d15addc7e7cd3a7b0d393aa4c7518c39ae7374623c3"}, - {file = "safetensors-0.4.3-cp312-none-win_amd64.whl", hash = "sha256:d14d30c25897b2bf19b6fb5ff7e26cc40006ad53fd4a88244fdf26517d852dd7"}, - {file = "safetensors-0.4.3-cp37-cp37m-macosx_10_12_x86_64.whl", hash = "sha256:d1456f814655b224d4bf6e7915c51ce74e389b413be791203092b7ff78c936dd"}, - {file = "safetensors-0.4.3-cp37-cp37m-macosx_11_0_arm64.whl", hash = "sha256:455d538aa1aae4a8b279344a08136d3f16334247907b18a5c3c7fa88ef0d3c46"}, - {file = "safetensors-0.4.3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cf476bca34e1340ee3294ef13e2c625833f83d096cfdf69a5342475602004f95"}, - {file = "safetensors-0.4.3-cp37-cp37m-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:02ef3a24face643456020536591fbd3c717c5abaa2737ec428ccbbc86dffa7a4"}, - {file = "safetensors-0.4.3-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:7de32d0d34b6623bb56ca278f90db081f85fb9c5d327e3c18fd23ac64f465768"}, - {file = "safetensors-0.4.3-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:2a0deb16a1d3ea90c244ceb42d2c6c276059616be21a19ac7101aa97da448faf"}, - {file = "safetensors-0.4.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c59d51f182c729f47e841510b70b967b0752039f79f1de23bcdd86462a9b09ee"}, - {file = "safetensors-0.4.3-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1f598b713cc1a4eb31d3b3203557ac308acf21c8f41104cdd74bf640c6e538e3"}, - {file = "safetensors-0.4.3-cp37-cp37m-musllinux_1_1_aarch64.whl", hash = "sha256:5757e4688f20df083e233b47de43845d1adb7e17b6cf7da5f8444416fc53828d"}, - {file = "safetensors-0.4.3-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:fe746d03ed8d193674a26105e4f0fe6c726f5bb602ffc695b409eaf02f04763d"}, - {file = "safetensors-0.4.3-cp37-none-win32.whl", hash = "sha256:0d5ffc6a80f715c30af253e0e288ad1cd97a3d0086c9c87995e5093ebc075e50"}, - {file = "safetensors-0.4.3-cp37-none-win_amd64.whl", hash = "sha256:a11c374eb63a9c16c5ed146457241182f310902bd2a9c18255781bb832b6748b"}, - {file = "safetensors-0.4.3-cp38-cp38-macosx_10_12_x86_64.whl", hash = "sha256:b1e31be7945f66be23f4ec1682bb47faa3df34cb89fc68527de6554d3c4258a4"}, - {file = "safetensors-0.4.3-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:03a4447c784917c9bf01d8f2ac5080bc15c41692202cd5f406afba16629e84d6"}, - {file = "safetensors-0.4.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d244bcafeb1bc06d47cfee71727e775bca88a8efda77a13e7306aae3813fa7e4"}, - {file = "safetensors-0.4.3-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:53c4879b9c6bd7cd25d114ee0ef95420e2812e676314300624594940a8d6a91f"}, - {file = "safetensors-0.4.3-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:74707624b81f1b7f2b93f5619d4a9f00934d5948005a03f2c1845ffbfff42212"}, - {file = "safetensors-0.4.3-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0d52c958dc210265157573f81d34adf54e255bc2b59ded6218500c9b15a750eb"}, - {file = "safetensors-0.4.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6f9568f380f513a60139971169c4a358b8731509cc19112369902eddb33faa4d"}, - {file = "safetensors-0.4.3-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:0d9cd8e1560dfc514b6d7859247dc6a86ad2f83151a62c577428d5102d872721"}, - {file = "safetensors-0.4.3-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:89f9f17b0dacb913ed87d57afbc8aad85ea42c1085bd5de2f20d83d13e9fc4b2"}, - {file = "safetensors-0.4.3-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:1139eb436fd201c133d03c81209d39ac57e129f5e74e34bb9ab60f8d9b726270"}, - {file = "safetensors-0.4.3-cp38-none-win32.whl", hash = "sha256:d9c289f140a9ae4853fc2236a2ffc9a9f2d5eae0cb673167e0f1b8c18c0961ac"}, - {file = "safetensors-0.4.3-cp38-none-win_amd64.whl", hash = "sha256:622afd28968ef3e9786562d352659a37de4481a4070f4ebac883f98c5836563e"}, - {file = "safetensors-0.4.3-cp39-cp39-macosx_10_12_x86_64.whl", hash = "sha256:8651c7299cbd8b4161a36cd6a322fa07d39cd23535b144d02f1c1972d0c62f3c"}, - {file = "safetensors-0.4.3-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:e375d975159ac534c7161269de24ddcd490df2157b55c1a6eeace6cbb56903f0"}, - {file = "safetensors-0.4.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:084fc436e317f83f7071fc6a62ca1c513b2103db325cd09952914b50f51cf78f"}, - {file = "safetensors-0.4.3-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:41a727a7f5e6ad9f1db6951adee21bbdadc632363d79dc434876369a17de6ad6"}, - {file = "safetensors-0.4.3-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:e7dbbde64b6c534548696808a0e01276d28ea5773bc9a2dfb97a88cd3dffe3df"}, - {file = "safetensors-0.4.3-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:bbae3b4b9d997971431c346edbfe6e41e98424a097860ee872721e176040a893"}, - {file = "safetensors-0.4.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:01e4b22e3284cd866edeabe4f4d896229495da457229408d2e1e4810c5187121"}, - {file = "safetensors-0.4.3-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:0dd37306546b58d3043eb044c8103a02792cc024b51d1dd16bd3dd1f334cb3ed"}, - {file = "safetensors-0.4.3-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:d8815b5e1dac85fc534a97fd339e12404db557878c090f90442247e87c8aeaea"}, - {file = "safetensors-0.4.3-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:e011cc162503c19f4b1fd63dfcddf73739c7a243a17dac09b78e57a00983ab35"}, - {file = "safetensors-0.4.3-cp39-none-win32.whl", hash = "sha256:01feb3089e5932d7e662eda77c3ecc389f97c0883c4a12b5cfdc32b589a811c3"}, - {file = "safetensors-0.4.3-cp39-none-win_amd64.whl", hash = "sha256:3f9cdca09052f585e62328c1c2923c70f46814715c795be65f0b93f57ec98a02"}, - {file = "safetensors-0.4.3-pp310-pypy310_pp73-macosx_10_12_x86_64.whl", hash = "sha256:1b89381517891a7bb7d1405d828b2bf5d75528299f8231e9346b8eba092227f9"}, - {file = "safetensors-0.4.3-pp310-pypy310_pp73-macosx_11_0_arm64.whl", hash = "sha256:cd6fff9e56df398abc5866b19a32124815b656613c1c5ec0f9350906fd798aac"}, - {file = "safetensors-0.4.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:840caf38d86aa7014fe37ade5d0d84e23dcfbc798b8078015831996ecbc206a3"}, - {file = "safetensors-0.4.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f9650713b2cfa9537a2baf7dd9fee458b24a0aaaa6cafcea8bdd5fb2b8efdc34"}, - {file = "safetensors-0.4.3-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:e4119532cd10dba04b423e0f86aecb96cfa5a602238c0aa012f70c3a40c44b50"}, - {file = "safetensors-0.4.3-pp310-pypy310_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:e066e8861eef6387b7c772344d1fe1f9a72800e04ee9a54239d460c400c72aab"}, - {file = "safetensors-0.4.3-pp310-pypy310_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:90964917f5b0fa0fa07e9a051fbef100250c04d150b7026ccbf87a34a54012e0"}, - {file = "safetensors-0.4.3-pp37-pypy37_pp73-macosx_10_12_x86_64.whl", hash = "sha256:c41e1893d1206aa7054029681778d9a58b3529d4c807002c156d58426c225173"}, - {file = "safetensors-0.4.3-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ae7613a119a71a497d012ccc83775c308b9c1dab454806291427f84397d852fd"}, - {file = "safetensors-0.4.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4f9bac020faba7f5dc481e881b14b6425265feabb5bfc552551d21189c0eddc3"}, - {file = "safetensors-0.4.3-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:420a98f593ff9930f5822560d14c395ccbc57342ddff3b463bc0b3d6b1951550"}, - {file = "safetensors-0.4.3-pp37-pypy37_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:f5e6883af9a68c0028f70a4c19d5a6ab6238a379be36ad300a22318316c00cb0"}, - {file = "safetensors-0.4.3-pp37-pypy37_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:cdd0a3b5da66e7f377474599814dbf5cbf135ff059cc73694de129b58a5e8a2c"}, - {file = "safetensors-0.4.3-pp38-pypy38_pp73-macosx_10_12_x86_64.whl", hash = "sha256:9bfb92f82574d9e58401d79c70c716985dc049b635fef6eecbb024c79b2c46ad"}, - {file = "safetensors-0.4.3-pp38-pypy38_pp73-macosx_11_0_arm64.whl", hash = "sha256:3615a96dd2dcc30eb66d82bc76cda2565f4f7bfa89fcb0e31ba3cea8a1a9ecbb"}, - {file = "safetensors-0.4.3-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:868ad1b6fc41209ab6bd12f63923e8baeb1a086814cb2e81a65ed3d497e0cf8f"}, - {file = "safetensors-0.4.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b7ffba80aa49bd09195145a7fd233a7781173b422eeb995096f2b30591639517"}, - {file = "safetensors-0.4.3-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:c0acbe31340ab150423347e5b9cc595867d814244ac14218932a5cf1dd38eb39"}, - {file = "safetensors-0.4.3-pp38-pypy38_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:19bbdf95de2cf64f25cd614c5236c8b06eb2cfa47cbf64311f4b5d80224623a3"}, - {file = "safetensors-0.4.3-pp38-pypy38_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:b852e47eb08475c2c1bd8131207b405793bfc20d6f45aff893d3baaad449ed14"}, - {file = "safetensors-0.4.3-pp39-pypy39_pp73-macosx_10_12_x86_64.whl", hash = "sha256:5d07cbca5b99babb692d76d8151bec46f461f8ad8daafbfd96b2fca40cadae65"}, - {file = "safetensors-0.4.3-pp39-pypy39_pp73-macosx_11_0_arm64.whl", hash = "sha256:1ab6527a20586d94291c96e00a668fa03f86189b8a9defa2cdd34a1a01acc7d5"}, - {file = "safetensors-0.4.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:02318f01e332cc23ffb4f6716e05a492c5f18b1d13e343c49265149396284a44"}, - {file = "safetensors-0.4.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ec4b52ce9a396260eb9731eb6aea41a7320de22ed73a1042c2230af0212758ce"}, - {file = "safetensors-0.4.3-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:018b691383026a2436a22b648873ed11444a364324e7088b99cd2503dd828400"}, - {file = "safetensors-0.4.3-pp39-pypy39_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:309b10dbcab63269ecbf0e2ca10ce59223bb756ca5d431ce9c9eeabd446569da"}, - {file = "safetensors-0.4.3-pp39-pypy39_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:b277482120df46e27a58082df06a15aebda4481e30a1c21eefd0921ae7e03f65"}, - {file = "safetensors-0.4.3.tar.gz", hash = "sha256:2f85fc50c4e07a21e95c24e07460fe6f7e2859d0ce88092838352b798ce711c2"}, + {file = "rsa-4.9-py3-none-any.whl", hash = "sha256:90260d9058e514786967344d0ef75fa8727eed8a7d2e43ce9f4bcf1b536174f7"}, + {file = "rsa-4.9.tar.gz", hash = "sha256:e38464a49c6c85d7f1351b0126661487a7e0a14a50f1675ec50eb34d4f20ef21"}, ] +[package.dependencies] +pyasn1 = ">=0.1.3" + +[[package]] +name = "s3transfer" +version = "0.10.2" +description = "An Amazon S3 Transfer Manager" +optional = true +python-versions = ">=3.8" +files = [ + {file = "s3transfer-0.10.2-py3-none-any.whl", hash = "sha256:eca1c20de70a39daee580aef4986996620f365c4e0fda6a86100231d62f1bf69"}, + {file = "s3transfer-0.10.2.tar.gz", hash = "sha256:0711534e9356d3cc692fdde846b4a1e4b0cb6519971860796e6bc4c7aea00ef6"}, +] + +[package.dependencies] +botocore = ">=1.33.2,<2.0a.0" + [package.extras] -all = ["safetensors[jax]", "safetensors[numpy]", "safetensors[paddlepaddle]", "safetensors[pinned-tf]", "safetensors[quality]", "safetensors[testing]", "safetensors[torch]"] -dev = ["safetensors[all]"] -jax = ["flax (>=0.6.3)", "jax (>=0.3.25)", "jaxlib (>=0.3.25)", "safetensors[numpy]"] -mlx = ["mlx (>=0.0.9)"] -numpy = ["numpy (>=1.21.6)"] -paddlepaddle = ["paddlepaddle (>=2.4.1)", "safetensors[numpy]"] -pinned-tf = ["safetensors[numpy]", "tensorflow (==2.11.0)"] -quality = ["black (==22.3)", "click (==8.0.4)", "flake8 (>=3.8.3)", "isort (>=5.5.4)"] -tensorflow = ["safetensors[numpy]", "tensorflow (>=2.11.0)"] -testing = ["h5py (>=3.7.0)", "huggingface-hub (>=0.12.1)", "hypothesis (>=6.70.2)", "pytest (>=7.2.0)", "pytest-benchmark (>=4.0.0)", "safetensors[numpy]", "setuptools-rust (>=1.5.2)"] -torch = ["safetensors[numpy]", "torch (>=1.10)"] +crt = ["botocore[crt] (>=1.33.2,<2.0a.0)"] + +[[package]] +name = "six" +version = "1.16.0" +description = "Python 2 and 3 compatibility utilities" +optional = true +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*" +files = [ + {file = "six-1.16.0-py2.py3-none-any.whl", hash = "sha256:8abb2f1d86890a2dfb989f9a77cfcfd3e47c2a354b01111771326f8aa26e0254"}, + {file = "six-1.16.0.tar.gz", hash = "sha256:1e61c37477a1626458e36f7b1d82aa5c9b094fa4802892072e49de9c60c4c926"}, +] [[package]] name = "sniffio" @@ -1529,7 +2036,7 @@ files = [ name = "sqlalchemy" version = "2.0.31" description = "Database Abstraction Library" -optional = false +optional = true python-versions = ">=3.7" files = [ {file = "SQLAlchemy-2.0.31-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:f2a213c1b699d3f5768a7272de720387ae0122f1becf0901ed6eaa1abd1baf6c"}, @@ -1698,7 +2205,7 @@ blobfile = ["blobfile (>=2)"] name = "tokenizers" version = "0.19.1" description = "" -optional = false +optional = true python-versions = ">=3.7" files = [ {file = "tokenizers-0.19.1-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:952078130b3d101e05ecfc7fc3640282d74ed26bcf691400f872563fca15ac97"}, @@ -1896,74 +2403,6 @@ notebook = ["ipywidgets (>=6)"] slack = ["slack-sdk"] telegram = ["requests"] -[[package]] -name = "transformers" -version = "4.42.3" -description = "State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow" -optional = false -python-versions = ">=3.8.0" -files = [ - {file = "transformers-4.42.3-py3-none-any.whl", hash = "sha256:a61a0df9609b7d69229d941b2fd857c841ba3043d6da503d0da1a4b133f65b92"}, - {file = "transformers-4.42.3.tar.gz", hash = "sha256:7539873ff45809145265cbc94ea4619d2713c41ceaa277b692d8b0be3430f7eb"}, -] - -[package.dependencies] -filelock = "*" -huggingface-hub = ">=0.23.2,<1.0" -numpy = ">=1.17,<2.0" -packaging = ">=20.0" -pyyaml = ">=5.1" -regex = "!=2019.12.17" -requests = "*" -safetensors = ">=0.4.1" -tokenizers = ">=0.19,<0.20" -tqdm = ">=4.27" - -[package.extras] -accelerate = ["accelerate (>=0.21.0)"] -agents = ["Pillow (>=10.0.1,<=15.0)", "accelerate (>=0.21.0)", "datasets (!=2.5.0)", "diffusers", "opencv-python", "sentencepiece (>=0.1.91,!=0.1.92)", "torch"] -all = ["Pillow (>=10.0.1,<=15.0)", "accelerate (>=0.21.0)", "av (==9.2.0)", "codecarbon (==1.2.0)", "decord (==0.6.0)", "flax (>=0.4.1,<=0.7.0)", "jax (>=0.4.1,<=0.4.13)", "jaxlib (>=0.4.1,<=0.4.13)", "kenlm", "keras-nlp (>=0.3.1)", "librosa", "onnxconverter-common", "optax (>=0.0.8,<=0.1.4)", "optuna", "phonemizer", "protobuf", "pyctcdecode (>=0.4.0)", "ray[tune] (>=2.7.0)", "scipy (<1.13.0)", "sentencepiece (>=0.1.91,!=0.1.92)", "sigopt", "tensorflow (>2.9,<2.16)", "tensorflow-text (<2.16)", "tf2onnx", "timm (<=0.9.16)", "tokenizers (>=0.19,<0.20)", "torch", "torchaudio", "torchvision"] -audio = ["kenlm", "librosa", "phonemizer", "pyctcdecode (>=0.4.0)"] -benchmark = ["optimum-benchmark (>=0.2.0)"] -codecarbon = ["codecarbon (==1.2.0)"] -deepspeed = ["accelerate (>=0.21.0)", "deepspeed (>=0.9.3)"] -deepspeed-testing = ["GitPython (<3.1.19)", "accelerate (>=0.21.0)", "beautifulsoup4", "cookiecutter (==1.7.3)", "datasets (!=2.5.0)", "deepspeed (>=0.9.3)", "dill (<0.3.5)", "evaluate (>=0.2.0)", "faiss-cpu", "nltk", "optuna", "parameterized", "protobuf", "psutil", "pydantic", "pytest (>=7.2.0,<8.0.0)", "pytest-rich", "pytest-timeout", "pytest-xdist", "rjieba", "rouge-score (!=0.0.7,!=0.0.8,!=0.1,!=0.1.1)", "ruff (==0.4.4)", "sacrebleu (>=1.4.12,<2.0.0)", "sacremoses", "sentencepiece (>=0.1.91,!=0.1.92)", "tensorboard", "timeout-decorator"] -dev = ["GitPython (<3.1.19)", "Pillow (>=10.0.1,<=15.0)", "accelerate (>=0.21.0)", "av (==9.2.0)", "beautifulsoup4", "codecarbon (==1.2.0)", "cookiecutter (==1.7.3)", "datasets (!=2.5.0)", "decord (==0.6.0)", "dill (<0.3.5)", "evaluate (>=0.2.0)", "faiss-cpu", "flax (>=0.4.1,<=0.7.0)", "fugashi (>=1.0)", "ipadic (>=1.0.0,<2.0)", "isort (>=5.5.4)", "jax (>=0.4.1,<=0.4.13)", "jaxlib (>=0.4.1,<=0.4.13)", "kenlm", "keras-nlp (>=0.3.1)", "librosa", "nltk", "onnxconverter-common", "optax (>=0.0.8,<=0.1.4)", "optuna", "parameterized", "phonemizer", "protobuf", "psutil", "pyctcdecode (>=0.4.0)", "pydantic", "pytest (>=7.2.0,<8.0.0)", "pytest-rich", "pytest-timeout", "pytest-xdist", "ray[tune] (>=2.7.0)", "rhoknp (>=1.1.0,<1.3.1)", "rjieba", "rouge-score (!=0.0.7,!=0.0.8,!=0.1,!=0.1.1)", "ruff (==0.4.4)", "sacrebleu (>=1.4.12,<2.0.0)", "sacremoses", "scikit-learn", "scipy (<1.13.0)", "sentencepiece (>=0.1.91,!=0.1.92)", "sigopt", "sudachidict-core (>=20220729)", "sudachipy (>=0.6.6)", "tensorboard", "tensorflow (>2.9,<2.16)", "tensorflow-text (<2.16)", "tf2onnx", "timeout-decorator", "timm (<=0.9.16)", "tokenizers (>=0.19,<0.20)", "torch", "torchaudio", "torchvision", "unidic (>=1.0.2)", "unidic-lite (>=1.0.7)", "urllib3 (<2.0.0)"] -dev-tensorflow = ["GitPython (<3.1.19)", "Pillow (>=10.0.1,<=15.0)", "beautifulsoup4", "cookiecutter (==1.7.3)", "datasets (!=2.5.0)", "dill (<0.3.5)", "evaluate (>=0.2.0)", "faiss-cpu", "isort (>=5.5.4)", "kenlm", "keras-nlp (>=0.3.1)", "librosa", "nltk", "onnxconverter-common", "onnxruntime (>=1.4.0)", "onnxruntime-tools (>=1.4.2)", "parameterized", "phonemizer", "protobuf", "psutil", "pyctcdecode (>=0.4.0)", "pydantic", "pytest (>=7.2.0,<8.0.0)", "pytest-rich", "pytest-timeout", "pytest-xdist", "rjieba", "rouge-score (!=0.0.7,!=0.0.8,!=0.1,!=0.1.1)", "ruff (==0.4.4)", "sacrebleu (>=1.4.12,<2.0.0)", "sacremoses", "scikit-learn", "sentencepiece (>=0.1.91,!=0.1.92)", "tensorboard", "tensorflow (>2.9,<2.16)", "tensorflow-text (<2.16)", "tf2onnx", "timeout-decorator", "tokenizers (>=0.19,<0.20)", "urllib3 (<2.0.0)"] -dev-torch = ["GitPython (<3.1.19)", "Pillow (>=10.0.1,<=15.0)", "accelerate (>=0.21.0)", "beautifulsoup4", "codecarbon (==1.2.0)", "cookiecutter (==1.7.3)", "datasets (!=2.5.0)", "dill (<0.3.5)", "evaluate (>=0.2.0)", "faiss-cpu", "fugashi (>=1.0)", "ipadic (>=1.0.0,<2.0)", "isort (>=5.5.4)", "kenlm", "librosa", "nltk", "onnxruntime (>=1.4.0)", "onnxruntime-tools (>=1.4.2)", "optuna", "parameterized", "phonemizer", "protobuf", "psutil", "pyctcdecode (>=0.4.0)", "pydantic", "pytest (>=7.2.0,<8.0.0)", "pytest-rich", "pytest-timeout", "pytest-xdist", "ray[tune] (>=2.7.0)", "rhoknp (>=1.1.0,<1.3.1)", "rjieba", "rouge-score (!=0.0.7,!=0.0.8,!=0.1,!=0.1.1)", "ruff (==0.4.4)", "sacrebleu (>=1.4.12,<2.0.0)", "sacremoses", "scikit-learn", "sentencepiece (>=0.1.91,!=0.1.92)", "sigopt", "sudachidict-core (>=20220729)", "sudachipy (>=0.6.6)", "tensorboard", "timeout-decorator", "timm (<=0.9.16)", "tokenizers (>=0.19,<0.20)", "torch", "torchaudio", "torchvision", "unidic (>=1.0.2)", "unidic-lite (>=1.0.7)", "urllib3 (<2.0.0)"] -flax = ["flax (>=0.4.1,<=0.7.0)", "jax (>=0.4.1,<=0.4.13)", "jaxlib (>=0.4.1,<=0.4.13)", "optax (>=0.0.8,<=0.1.4)", "scipy (<1.13.0)"] -flax-speech = ["kenlm", "librosa", "phonemizer", "pyctcdecode (>=0.4.0)"] -ftfy = ["ftfy"] -integrations = ["optuna", "ray[tune] (>=2.7.0)", "sigopt"] -ja = ["fugashi (>=1.0)", "ipadic (>=1.0.0,<2.0)", "rhoknp (>=1.1.0,<1.3.1)", "sudachidict-core (>=20220729)", "sudachipy (>=0.6.6)", "unidic (>=1.0.2)", "unidic-lite (>=1.0.7)"] -modelcreation = ["cookiecutter (==1.7.3)"] -natten = ["natten (>=0.14.6,<0.15.0)"] -onnx = ["onnxconverter-common", "onnxruntime (>=1.4.0)", "onnxruntime-tools (>=1.4.2)", "tf2onnx"] -onnxruntime = ["onnxruntime (>=1.4.0)", "onnxruntime-tools (>=1.4.2)"] -optuna = ["optuna"] -quality = ["GitPython (<3.1.19)", "datasets (!=2.5.0)", "isort (>=5.5.4)", "ruff (==0.4.4)", "urllib3 (<2.0.0)"] -ray = ["ray[tune] (>=2.7.0)"] -retrieval = ["datasets (!=2.5.0)", "faiss-cpu"] -ruff = ["ruff (==0.4.4)"] -sagemaker = ["sagemaker (>=2.31.0)"] -sentencepiece = ["protobuf", "sentencepiece (>=0.1.91,!=0.1.92)"] -serving = ["fastapi", "pydantic", "starlette", "uvicorn"] -sigopt = ["sigopt"] -sklearn = ["scikit-learn"] -speech = ["kenlm", "librosa", "phonemizer", "pyctcdecode (>=0.4.0)", "torchaudio"] -testing = ["GitPython (<3.1.19)", "beautifulsoup4", "cookiecutter (==1.7.3)", "datasets (!=2.5.0)", "dill (<0.3.5)", "evaluate (>=0.2.0)", "faiss-cpu", "nltk", "parameterized", "psutil", "pydantic", "pytest (>=7.2.0,<8.0.0)", "pytest-rich", "pytest-timeout", "pytest-xdist", "rjieba", "rouge-score (!=0.0.7,!=0.0.8,!=0.1,!=0.1.1)", "ruff (==0.4.4)", "sacrebleu (>=1.4.12,<2.0.0)", "sacremoses", "sentencepiece (>=0.1.91,!=0.1.92)", "tensorboard", "timeout-decorator"] -tf = ["keras-nlp (>=0.3.1)", "onnxconverter-common", "tensorflow (>2.9,<2.16)", "tensorflow-text (<2.16)", "tf2onnx"] -tf-cpu = ["keras (>2.9,<2.16)", "keras-nlp (>=0.3.1)", "onnxconverter-common", "tensorflow-cpu (>2.9,<2.16)", "tensorflow-probability (<0.24)", "tensorflow-text (<2.16)", "tf2onnx"] -tf-speech = ["kenlm", "librosa", "phonemizer", "pyctcdecode (>=0.4.0)"] -timm = ["timm (<=0.9.16)"] -tokenizers = ["tokenizers (>=0.19,<0.20)"] -torch = ["accelerate (>=0.21.0)", "torch"] -torch-speech = ["kenlm", "librosa", "phonemizer", "pyctcdecode (>=0.4.0)", "torchaudio"] -torch-vision = ["Pillow (>=10.0.1,<=15.0)", "torchvision"] -torchhub = ["filelock", "huggingface-hub (>=0.23.2,<1.0)", "importlib-metadata", "numpy (>=1.17,<2.0)", "packaging (>=20.0)", "protobuf", "regex (!=2019.12.17)", "requests", "sentencepiece (>=0.1.91,!=0.1.92)", "tokenizers (>=0.19,<0.20)", "torch", "tqdm (>=4.27)"] -video = ["av (==9.2.0)", "decord (==0.6.0)"] -vision = ["Pillow (>=10.0.1,<=15.0)"] - [[package]] name = "triton" version = "2.3.1" @@ -1998,6 +2437,45 @@ files = [ {file = "types_PyYAML-6.0.12.20240311-py3-none-any.whl", hash = "sha256:b845b06a1c7e54b8e5b4c683043de0d9caf205e7434b3edc678ff2411979b8f6"}, ] +[[package]] +name = "types-requests" +version = "2.31.0.6" +description = "Typing stubs for requests" +optional = true +python-versions = ">=3.7" +files = [ + {file = "types-requests-2.31.0.6.tar.gz", hash = "sha256:cd74ce3b53c461f1228a9b783929ac73a666658f223e28ed29753771477b3bd0"}, + {file = "types_requests-2.31.0.6-py3-none-any.whl", hash = "sha256:a2db9cb228a81da8348b49ad6db3f5519452dd20a9c1e1a868c83c5fe88fd1a9"}, +] + +[package.dependencies] +types-urllib3 = "*" + +[[package]] +name = "types-requests" +version = "2.32.0.20240712" +description = "Typing stubs for requests" +optional = true +python-versions = ">=3.8" +files = [ + {file = "types-requests-2.32.0.20240712.tar.gz", hash = "sha256:90c079ff05e549f6bf50e02e910210b98b8ff1ebdd18e19c873cd237737c1358"}, + {file = "types_requests-2.32.0.20240712-py3-none-any.whl", hash = "sha256:f754283e152c752e46e70942fa2a146b5bc70393522257bb85bd1ef7e019dcc3"}, +] + +[package.dependencies] +urllib3 = ">=2" + +[[package]] +name = "types-urllib3" +version = "1.26.25.14" +description = "Typing stubs for urllib3" +optional = true +python-versions = "*" +files = [ + {file = "types-urllib3-1.26.25.14.tar.gz", hash = "sha256:229b7f577c951b8c1b92c1bc2b2fdb0b49847bd2af6d1cc2a2e3dd340f3bda8f"}, + {file = "types_urllib3-1.26.25.14-py3-none-any.whl", hash = "sha256:9683bbb7fb72e32bfe9d2be6e04875fbe1b3eeec3cbb4ea231435aa7fd6b4f0e"}, +] + [[package]] name = "typing-extensions" version = "4.12.2" @@ -2009,6 +2487,33 @@ files = [ {file = "typing_extensions-4.12.2.tar.gz", hash = "sha256:1a7ead55c7e559dd4dee8856e3a88b41225abfe1ce8df57b7c13915fe121ffb8"}, ] +[[package]] +name = "uritemplate" +version = "4.1.1" +description = "Implementation of RFC 6570 URI Templates" +optional = true +python-versions = ">=3.6" +files = [ + {file = "uritemplate-4.1.1-py2.py3-none-any.whl", hash = "sha256:830c08b8d99bdd312ea4ead05994a38e8936266f84b9a7878232db50b044e02e"}, + {file = "uritemplate-4.1.1.tar.gz", hash = "sha256:4346edfc5c3b79f694bccd6d6099a322bbeb628dbf2cd86eea55a456ce5124f0"}, +] + +[[package]] +name = "urllib3" +version = "1.26.19" +description = "HTTP library with thread-safe connection pooling, file post, and more." +optional = false +python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,>=2.7" +files = [ + {file = "urllib3-1.26.19-py2.py3-none-any.whl", hash = "sha256:37a0344459b199fce0e80b0d3569837ec6b6937435c5244e7fd73fa6006830f3"}, + {file = "urllib3-1.26.19.tar.gz", hash = "sha256:3e3d753a8618b86d7de333b4223005f68720bcd6a7d2bcb9fbd2229ec7c1e429"}, +] + +[package.extras] +brotli = ["brotli (==1.0.9)", "brotli (>=1.0.9)", "brotlicffi (>=0.8.0)", "brotlipy (>=0.6.0)"] +secure = ["certifi", "cryptography (>=1.3.4)", "idna (>=2.0.0)", "ipaddress", "pyOpenSSL (>=0.14)", "urllib3-secure-extra"] +socks = ["PySocks (>=1.5.6,!=1.5.7,<2.0)"] + [[package]] name = "urllib3" version = "2.2.2" @@ -2046,7 +2551,18 @@ platformdirs = ">=3.9.1,<5" docs = ["furo (>=2023.7.26)", "proselint (>=0.13)", "sphinx (>=7.1.2,!=7.3)", "sphinx-argparse (>=0.4)", "sphinxcontrib-towncrier (>=0.2.1a0)", "towncrier (>=23.6)"] test = ["covdefaults (>=2.3)", "coverage (>=7.2.7)", "coverage-enable-subprocess (>=1)", "flaky (>=3.7)", "packaging (>=23.1)", "pytest (>=7.4)", "pytest-env (>=0.8.2)", "pytest-freezer (>=0.4.8)", "pytest-mock (>=3.11.1)", "pytest-randomly (>=3.12)", "pytest-timeout (>=2.1)", "setuptools (>=68)", "time-machine (>=2.10)"] +[extras] +anthropic = ["anthropic"] +cohere = ["cohere"] +faiss-cpu = ["faiss-cpu"] +google-generativeai = ["google-generativeai"] +groq = ["groq"] +openai = ["openai"] +pgvector = ["pgvector"] +sqlalchemy = ["sqlalchemy"] +torch = ["torch"] + [metadata] lock-version = "2.0" python-versions = ">=3.9, <4.0" -content-hash = "933e0a902cffe3e2808a2337d0eeb6a2b566998c7bfb5e3b2edd8e289566cd39" +content-hash = "b0b647042d40d1088bae923979a8db0d4a51b494af0c6138fe1fdaa636ba1b7a" diff --git a/lightrag/pyproject.toml b/lightrag/pyproject.toml index de58e59d..88976544 100644 --- a/lightrag/pyproject.toml +++ b/lightrag/pyproject.toml @@ -1,7 +1,7 @@ [tool.poetry] name = "lightrag" -version = "0.0.0-beta.1" +version = "0.1.0-beta.2" description = "The Lightning Library for LLM Applications." authors = ["Li Yin "] readme = "README.md" @@ -43,24 +43,45 @@ numpy = "^1.26.4" tqdm = "^4.66.4" pyyaml = "^6.0.1" +# Optional dependencies +openai = { version = "^1.12.0", optional = true } +groq = { version = "^0.5.0", optional = true } +faiss-cpu = { version = "^1.8.0", optional = true } +sqlalchemy = { version = "^2.0.30", optional = true } +pgvector = { version = "^0.3.1", optional = true } +torch = { version = "^2.3.1", optional = true } +anthropic = { version = "^0.31.1", optional = true } +google-generativeai = { version = "^0.7.2", optional = true } +cohere = { version = "^5.5.8", optional = true } + [tool.poetry.group.test.dependencies] pytest = "^8.1.1" pytest-mock = "^3.14.0" -transformers = "^4.41.2" torch = "^2.3.1" +faiss-cpu = "^1.8.0" +openai = "^1.12.0" + [tool.poetry.group.typing.dependencies] mypy = "^1" types-pyyaml = "^6.0.12.20240311" # for mypy -[tool.poetry.group.dev.dependencies] +[tool.poetry.group.dev.dependencies] # specify the versions for extras pre-commit = "^3.7.0" -openai = "^1.12.0" -groq = "^0.5.0" # should only be installed if groq client is used -faiss-cpu = "^1.8.0" -sqlalchemy = "^2.0.30" + + +[tool.poetry.extras] # allow pip install lightrag[openai, groq] +openai = ["openai"] +groq = ["groq"] +anthropic = ["anthropic"] +cohere = ["cohere"] +google-generativeai = ["google-generativeai"] +pgvector = ["pgvector"] +faiss-cpu = ["faiss-cpu"] +sqlalchemy = ["sqlalchemy"] +torch = ["torch"] [tool.ruff] diff --git a/poetry.lock b/poetry.lock index f3243af7..d1ef6235 100644 --- a/poetry.lock +++ b/poetry.lock @@ -286,17 +286,17 @@ css = ["tinycss2 (>=1.1.0,<1.3)"] [[package]] name = "boto3" -version = "1.34.143" +version = "1.34.144" description = "The AWS SDK for Python" optional = false python-versions = ">=3.8" files = [ - {file = "boto3-1.34.143-py3-none-any.whl", hash = "sha256:0d16832f23e6bd3ae94e35ea8e625529850bfad9baccd426de96ad8f445d8e03"}, - {file = "boto3-1.34.143.tar.gz", hash = "sha256:b590ce80c65149194def43ebf0ea1cf0533945502507837389a8d22e3ecbcf05"}, + {file = "boto3-1.34.144-py3-none-any.whl", hash = "sha256:b8433d481d50b68a0162c0379c0dd4aabfc3d1ad901800beb5b87815997511c1"}, + {file = "boto3-1.34.144.tar.gz", hash = "sha256:2f3e88b10b8fcc5f6100a9d74cd28230edc9d4fa226d99dd40a3ab38ac213673"}, ] [package.dependencies] -botocore = ">=1.34.143,<1.35.0" +botocore = ">=1.34.144,<1.35.0" jmespath = ">=0.7.1,<2.0.0" s3transfer = ">=0.10.0,<0.11.0" @@ -305,13 +305,13 @@ crt = ["botocore[crt] (>=1.21.0,<2.0a0)"] [[package]] name = "botocore" -version = "1.34.143" +version = "1.34.144" description = "Low-level, data-driven core of boto 3." optional = false python-versions = ">=3.8" files = [ - {file = "botocore-1.34.143-py3-none-any.whl", hash = "sha256:094aea179e8aaa1bc957ad49cc27d93b189dd3a1f3075d8b0ca7c445a2a88430"}, - {file = "botocore-1.34.143.tar.gz", hash = "sha256:059f032ec05733a836e04e869c5a15534420102f93116f3bc9a5b759b0651caf"}, + {file = "botocore-1.34.144-py3-none-any.whl", hash = "sha256:a2cf26e1bf10d5917a2285e50257bc44e94a1d16574f282f3274f7a5d8d1f08b"}, + {file = "botocore-1.34.144.tar.gz", hash = "sha256:4215db28d25309d59c99507f1f77df9089e5bebbad35f6e19c7c44ec5383a3e8"}, ] [package.dependencies] @@ -324,13 +324,13 @@ crt = ["awscrt (==0.20.11)"] [[package]] name = "cachetools" -version = "5.3.3" +version = "5.4.0" description = "Extensible memoizing collections and decorators" optional = false python-versions = ">=3.7" files = [ - {file = "cachetools-5.3.3-py3-none-any.whl", hash = "sha256:0abad1021d3f8325b2fc1d2e9c8b9c9d57b04c3932657a72465447332c24d945"}, - {file = "cachetools-5.3.3.tar.gz", hash = "sha256:ba29e2dfa0b8b556606f097407ed1aa62080ee108ab0dc5ec9d6a723a007d105"}, + {file = "cachetools-5.4.0-py3-none-any.whl", hash = "sha256:3ae3b49a3d5e28a77a0be2b37dbcb89005058959cb2323858c2657c4a8cab474"}, + {file = "cachetools-5.4.0.tar.gz", hash = "sha256:b8adc2e7c07f105ced7bc56dbb6dfbe7c4a00acce20e2227b3f355be89bc6827"}, ] [[package]] @@ -1229,61 +1229,61 @@ typing-extensions = ">=4.7,<5" [[package]] name = "grpcio" -version = "1.65.0" +version = "1.64.1" description = "HTTP/2-based RPC framework" optional = false python-versions = ">=3.8" files = [ - {file = "grpcio-1.65.0-cp310-cp310-linux_armv7l.whl", hash = "sha256:66ea0ca6108fcb391444bb7b37d04eac85bfaea1cfaf16db675d3734fc74ca1b"}, - {file = "grpcio-1.65.0-cp310-cp310-macosx_12_0_universal2.whl", hash = "sha256:45d371dc4436fdcc31677f75b3ebe6175fbf0712ced49e0e4dfc18bbaf50f5a7"}, - {file = "grpcio-1.65.0-cp310-cp310-manylinux_2_17_aarch64.whl", hash = "sha256:02dbbe113ec48581da07b7ddf52bfd49f5772374c4b5e36ea25131ce00b4f4f3"}, - {file = "grpcio-1.65.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:5c9ee7b8f1ac82cc24f223cd7ec803c17079f90e63022d3e66c5e53fff0afb99"}, - {file = "grpcio-1.65.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:da927f8a44e42837ae0027a3a063c85e2b26491d2babd4554e116f66fd46045d"}, - {file = "grpcio-1.65.0-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:9916ea670a589f95f2453a4a5040294ace096271c126e684a1e45e61af76c988"}, - {file = "grpcio-1.65.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:c46114787c5f530e845d2781f914600aade04b4f132dd012efb31bc4f76a72bb"}, - {file = "grpcio-1.65.0-cp310-cp310-win32.whl", hash = "sha256:1362d94ac9c05b202736180d23296840e00f495859b206261e6ed03a6d41978b"}, - {file = "grpcio-1.65.0-cp310-cp310-win_amd64.whl", hash = "sha256:00ed0828980009ce852d98230cdd2d5a22a4bcb946b5a0f6334dfd8258374cd7"}, - {file = "grpcio-1.65.0-cp311-cp311-linux_armv7l.whl", hash = "sha256:25303f3747522252dd9cfcbacb88d828a36040f513e28fba17ee6184ebc3d330"}, - {file = "grpcio-1.65.0-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:2a2b368717dd8e0f6cb7e412d3b3bfb0012f61c04b2f76dbed669b0f5cf3fb0c"}, - {file = "grpcio-1.65.0-cp311-cp311-manylinux_2_17_aarch64.whl", hash = "sha256:93c41fb74c576dc0130b190a5775197282115c6abbe1d913d42d9a2f9d98fdae"}, - {file = "grpcio-1.65.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:34eb4fb9ef4d11ea741d264916d1b31a9e169d539a6f1c8300e04c493eec747e"}, - {file = "grpcio-1.65.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:55c41272f9d7d3503e3e3e93f3f98589f07075eebd24e1c291a1df2e8ef40a49"}, - {file = "grpcio-1.65.0-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:c275bac926754022c89ef03f16470f65b811e2cc25f2167d365564ad43e31001"}, - {file = "grpcio-1.65.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:b02db2a59071f4d05cfc4d0c972759778d27e1d3347f22ca178b91117ad10541"}, - {file = "grpcio-1.65.0-cp311-cp311-win32.whl", hash = "sha256:ec9f41b9b0eb6407a6edb21bc22cb32e03cae76cde9c1d8bb151ed77c2c5af94"}, - {file = "grpcio-1.65.0-cp311-cp311-win_amd64.whl", hash = "sha256:3efc8b0600870f5e518dd2738188b3ba7b1bb2668244c9a2a8c4debda4ffe62b"}, - {file = "grpcio-1.65.0-cp312-cp312-linux_armv7l.whl", hash = "sha256:d787abafafa9ed71e17220d4178c883abdb380e0484bd8965cb2e06375c7495b"}, - {file = "grpcio-1.65.0-cp312-cp312-macosx_10_9_universal2.whl", hash = "sha256:52347f21d6ec77d7e7e4d5037f5e8ac0a0c851856d9459f9f95b009c2c740b4a"}, - {file = "grpcio-1.65.0-cp312-cp312-manylinux_2_17_aarch64.whl", hash = "sha256:b16e1cd9b9cb9ac942cb20b7a2b1c5d35b9e61017e2998bf242a6f7748071795"}, - {file = "grpcio-1.65.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:89bc9c8c6743a48f115fea8f3fada76be269d1914bf636e5fdb7cec9cdf192bc"}, - {file = "grpcio-1.65.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c5a2ae900e6423438c4a9a5be38e9228621340a18333371215c0419d24a254ef"}, - {file = "grpcio-1.65.0-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:4f451091ddd28f00c655f0b1e208cca705d40e4fde56a3cf849fead61a700d10"}, - {file = "grpcio-1.65.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:4e30cd885e02abb98d6b0d5beb6259a567b0ce1416c498ec815fe383adb77864"}, - {file = "grpcio-1.65.0-cp312-cp312-win32.whl", hash = "sha256:9a9a0ce10a07923ebd48c056060052ebddfbec3193cdd32207af358ef317b00a"}, - {file = "grpcio-1.65.0-cp312-cp312-win_amd64.whl", hash = "sha256:87d9350ffe1a84b7441db7c70fdb4e51269a379f7a95d696d0d133831c4f9a19"}, - {file = "grpcio-1.65.0-cp38-cp38-linux_armv7l.whl", hash = "sha256:0c504b30fc2fba143d9254e0240243b5866df9b7523162448797f4b21b5f30d5"}, - {file = "grpcio-1.65.0-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:480be4d41ceb5a7f22ecfc8db1ab68aeb58cc1a2da0865a91917d3cd0438dac7"}, - {file = "grpcio-1.65.0-cp38-cp38-manylinux_2_17_aarch64.whl", hash = "sha256:984a1627b50d5df4a24120302ca95adb5139ba1c40354ba258fc2913666d8ee7"}, - {file = "grpcio-1.65.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f242956c0f4985dfcc920cd251cd7a899ca168e157e98c9b74a688657e813ad6"}, - {file = "grpcio-1.65.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7ea93f570b2341c69635b8a333afb99fb4d5584f26a9cc94f06e56c943648aab"}, - {file = "grpcio-1.65.0-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:1bebefd76517a43d0e77a5dcd61a8b69e9775340d856a0b35c6368ae628f7714"}, - {file = "grpcio-1.65.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:356d10a491a92a08c21aef806379f7b020f591c23580e3d29aeeb59d45908c86"}, - {file = "grpcio-1.65.0-cp38-cp38-win32.whl", hash = "sha256:c3294fd3ef9faa1fe14ad15d72dd7d2ee9fee6d3bd29a08c53e59a3c94de9cc9"}, - {file = "grpcio-1.65.0-cp38-cp38-win_amd64.whl", hash = "sha256:a2defc49c984550f25034e88d17a7e69dba6deb2b981d8f56f19b3aaa788ff30"}, - {file = "grpcio-1.65.0-cp39-cp39-linux_armv7l.whl", hash = "sha256:b73022222ed4bf718d3d8527a9b88b162074a62c7530d30f4e951b56304b0f19"}, - {file = "grpcio-1.65.0-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:16e0f789158ecc8309e0a2f16cb8c5e4753f351a7673aab75f42783c83f1e38b"}, - {file = "grpcio-1.65.0-cp39-cp39-manylinux_2_17_aarch64.whl", hash = "sha256:cb0bd8bfba21fe0318317bf11687c67a3f8ce726369c0b3ccf4e6607fc5bc5f2"}, - {file = "grpcio-1.65.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d1096f0fa79ec601aefd71685d3a610cdde96274c38cd8adcef972660297669a"}, - {file = "grpcio-1.65.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e576a88ce82fea70e68c548aceb5cd560c27da50091581996858bbbe01230c83"}, - {file = "grpcio-1.65.0-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:ab70bd1ccb05ef373b691a9b9985289d8b2cf63c704471f5ee132e228d351af5"}, - {file = "grpcio-1.65.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:03eab632a8ce8dba00d97482d2821bf752a7c3cb4dc051be6c587ad3ca1c3e6d"}, - {file = "grpcio-1.65.0-cp39-cp39-win32.whl", hash = "sha256:f19bb85795ca82e007be427e7b6ac5e730023ffbab69d39ddeb1b84c6339df16"}, - {file = "grpcio-1.65.0-cp39-cp39-win_amd64.whl", hash = "sha256:dbd7eeafa67d8e403ac61caa31ebda2861435dcfd7bb7953c4ef05ad2ecf74bf"}, - {file = "grpcio-1.65.0.tar.gz", hash = "sha256:2c7891f66daefc80cce1bed6bc0c2802d26dac46544ba1be79c4e7d85661dd73"}, + {file = "grpcio-1.64.1-cp310-cp310-linux_armv7l.whl", hash = "sha256:55697ecec192bc3f2f3cc13a295ab670f51de29884ca9ae6cd6247df55df2502"}, + {file = "grpcio-1.64.1-cp310-cp310-macosx_12_0_universal2.whl", hash = "sha256:3b64ae304c175671efdaa7ec9ae2cc36996b681eb63ca39c464958396697daff"}, + {file = "grpcio-1.64.1-cp310-cp310-manylinux_2_17_aarch64.whl", hash = "sha256:bac71b4b28bc9af61efcdc7630b166440bbfbaa80940c9a697271b5e1dabbc61"}, + {file = "grpcio-1.64.1-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6c024ffc22d6dc59000faf8ad781696d81e8e38f4078cb0f2630b4a3cf231a90"}, + {file = "grpcio-1.64.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e7cd5c1325f6808b8ae31657d281aadb2a51ac11ab081ae335f4f7fc44c1721d"}, + {file = "grpcio-1.64.1-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:0a2813093ddb27418a4c99f9b1c223fab0b053157176a64cc9db0f4557b69bd9"}, + {file = "grpcio-1.64.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:2981c7365a9353f9b5c864595c510c983251b1ab403e05b1ccc70a3d9541a73b"}, + {file = "grpcio-1.64.1-cp310-cp310-win32.whl", hash = "sha256:1262402af5a511c245c3ae918167eca57342c72320dffae5d9b51840c4b2f86d"}, + {file = "grpcio-1.64.1-cp310-cp310-win_amd64.whl", hash = "sha256:19264fc964576ddb065368cae953f8d0514ecc6cb3da8903766d9fb9d4554c33"}, + {file = "grpcio-1.64.1-cp311-cp311-linux_armv7l.whl", hash = "sha256:58b1041e7c870bb30ee41d3090cbd6f0851f30ae4eb68228955d973d3efa2e61"}, + {file = "grpcio-1.64.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:bbc5b1d78a7822b0a84c6f8917faa986c1a744e65d762ef6d8be9d75677af2ca"}, + {file = "grpcio-1.64.1-cp311-cp311-manylinux_2_17_aarch64.whl", hash = "sha256:5841dd1f284bd1b3d8a6eca3a7f062b06f1eec09b184397e1d1d43447e89a7ae"}, + {file = "grpcio-1.64.1-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:8caee47e970b92b3dd948371230fcceb80d3f2277b3bf7fbd7c0564e7d39068e"}, + {file = "grpcio-1.64.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:73819689c169417a4f978e562d24f2def2be75739c4bed1992435d007819da1b"}, + {file = "grpcio-1.64.1-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:6503b64c8b2dfad299749cad1b595c650c91e5b2c8a1b775380fcf8d2cbba1e9"}, + {file = "grpcio-1.64.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:1de403fc1305fd96cfa75e83be3dee8538f2413a6b1685b8452301c7ba33c294"}, + {file = "grpcio-1.64.1-cp311-cp311-win32.whl", hash = "sha256:d4d29cc612e1332237877dfa7fe687157973aab1d63bd0f84cf06692f04c0367"}, + {file = "grpcio-1.64.1-cp311-cp311-win_amd64.whl", hash = "sha256:5e56462b05a6f860b72f0fa50dca06d5b26543a4e88d0396259a07dc30f4e5aa"}, + {file = "grpcio-1.64.1-cp312-cp312-linux_armv7l.whl", hash = "sha256:4657d24c8063e6095f850b68f2d1ba3b39f2b287a38242dcabc166453e950c59"}, + {file = "grpcio-1.64.1-cp312-cp312-macosx_10_9_universal2.whl", hash = "sha256:62b4e6eb7bf901719fce0ca83e3ed474ae5022bb3827b0a501e056458c51c0a1"}, + {file = "grpcio-1.64.1-cp312-cp312-manylinux_2_17_aarch64.whl", hash = "sha256:ee73a2f5ca4ba44fa33b4d7d2c71e2c8a9e9f78d53f6507ad68e7d2ad5f64a22"}, + {file = "grpcio-1.64.1-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:198908f9b22e2672a998870355e226a725aeab327ac4e6ff3a1399792ece4762"}, + {file = "grpcio-1.64.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:39b9d0acaa8d835a6566c640f48b50054f422d03e77e49716d4c4e8e279665a1"}, + {file = "grpcio-1.64.1-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:5e42634a989c3aa6049f132266faf6b949ec2a6f7d302dbb5c15395b77d757eb"}, + {file = "grpcio-1.64.1-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:b1a82e0b9b3022799c336e1fc0f6210adc019ae84efb7321d668129d28ee1efb"}, + {file = "grpcio-1.64.1-cp312-cp312-win32.whl", hash = "sha256:55260032b95c49bee69a423c2f5365baa9369d2f7d233e933564d8a47b893027"}, + {file = "grpcio-1.64.1-cp312-cp312-win_amd64.whl", hash = "sha256:c1a786ac592b47573a5bb7e35665c08064a5d77ab88a076eec11f8ae86b3e3f6"}, + {file = "grpcio-1.64.1-cp38-cp38-linux_armv7l.whl", hash = "sha256:a011ac6c03cfe162ff2b727bcb530567826cec85eb8d4ad2bfb4bd023287a52d"}, + {file = "grpcio-1.64.1-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:4d6dab6124225496010bd22690f2d9bd35c7cbb267b3f14e7a3eb05c911325d4"}, + {file = "grpcio-1.64.1-cp38-cp38-manylinux_2_17_aarch64.whl", hash = "sha256:a5e771d0252e871ce194d0fdcafd13971f1aae0ddacc5f25615030d5df55c3a2"}, + {file = "grpcio-1.64.1-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:2c3c1b90ab93fed424e454e93c0ed0b9d552bdf1b0929712b094f5ecfe7a23ad"}, + {file = "grpcio-1.64.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:20405cb8b13fd779135df23fabadc53b86522d0f1cba8cca0e87968587f50650"}, + {file = "grpcio-1.64.1-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:0cc79c982ccb2feec8aad0e8fb0d168bcbca85bc77b080d0d3c5f2f15c24ea8f"}, + {file = "grpcio-1.64.1-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:a3a035c37ce7565b8f4f35ff683a4db34d24e53dc487e47438e434eb3f701b2a"}, + {file = "grpcio-1.64.1-cp38-cp38-win32.whl", hash = "sha256:1257b76748612aca0f89beec7fa0615727fd6f2a1ad580a9638816a4b2eb18fd"}, + {file = "grpcio-1.64.1-cp38-cp38-win_amd64.whl", hash = "sha256:0a12ddb1678ebc6a84ec6b0487feac020ee2b1659cbe69b80f06dbffdb249122"}, + {file = "grpcio-1.64.1-cp39-cp39-linux_armv7l.whl", hash = "sha256:75dbbf415026d2862192fe1b28d71f209e2fd87079d98470db90bebe57b33179"}, + {file = "grpcio-1.64.1-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:e3d9f8d1221baa0ced7ec7322a981e28deb23749c76eeeb3d33e18b72935ab62"}, + {file = "grpcio-1.64.1-cp39-cp39-manylinux_2_17_aarch64.whl", hash = "sha256:5f8b75f64d5d324c565b263c67dbe4f0af595635bbdd93bb1a88189fc62ed2e5"}, + {file = "grpcio-1.64.1-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c84ad903d0d94311a2b7eea608da163dace97c5fe9412ea311e72c3684925602"}, + {file = "grpcio-1.64.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:940e3ec884520155f68a3b712d045e077d61c520a195d1a5932c531f11883489"}, + {file = "grpcio-1.64.1-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:f10193c69fc9d3d726e83bbf0f3d316f1847c3071c8c93d8090cf5f326b14309"}, + {file = "grpcio-1.64.1-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:ac15b6c2c80a4d1338b04d42a02d376a53395ddf0ec9ab157cbaf44191f3ffdd"}, + {file = "grpcio-1.64.1-cp39-cp39-win32.whl", hash = "sha256:03b43d0ccf99c557ec671c7dede64f023c7da9bb632ac65dbc57f166e4970040"}, + {file = "grpcio-1.64.1-cp39-cp39-win_amd64.whl", hash = "sha256:ed6091fa0adcc7e4ff944090cf203a52da35c37a130efa564ded02b7aff63bcd"}, + {file = "grpcio-1.64.1.tar.gz", hash = "sha256:8d51dd1c59d5fa0f34266b80a3805ec29a1f26425c2a54736133f6d87fc4968a"}, ] [package.extras] -protobuf = ["grpcio-tools (>=1.65.0)"] +protobuf = ["grpcio-tools (>=1.64.1)"] [[package]] name = "grpcio-status" @@ -1910,13 +1910,13 @@ jupyter-server = ">=1.1.2" [[package]] name = "jupyter-server" -version = "2.14.1" +version = "2.14.2" description = "The backend—i.e. core services, APIs, and REST endpoints—to Jupyter web applications." optional = false python-versions = ">=3.8" files = [ - {file = "jupyter_server-2.14.1-py3-none-any.whl", hash = "sha256:16f7177c3a4ea8fe37784e2d31271981a812f0b2874af17339031dc3510cc2a5"}, - {file = "jupyter_server-2.14.1.tar.gz", hash = "sha256:12558d158ec7a0653bf96cc272bc7ad79e0127d503b982ed144399346694f726"}, + {file = "jupyter_server-2.14.2-py3-none-any.whl", hash = "sha256:47ff506127c2f7851a17bf4713434208fc490955d0e8632e95014a9a9afbeefd"}, + {file = "jupyter_server-2.14.2.tar.gz", hash = "sha256:66095021aa9638ced276c248b1d81862e4c50f292d575920bbe960de1c56b12b"}, ] [package.dependencies] @@ -2158,13 +2158,13 @@ files = [ [[package]] name = "lightning-utilities" -version = "0.11.3.post0" +version = "0.11.5" description = "Lightning toolbox for across the our ecosystem." optional = false python-versions = ">=3.8" files = [ - {file = "lightning_utilities-0.11.3.post0-py3-none-any.whl", hash = "sha256:2aec1d067e5ab61a8978f879998850a97f9a3764ee54aade329552706b0d189b"}, - {file = "lightning_utilities-0.11.3.post0.tar.gz", hash = "sha256:7485fad0e3c5607a6bde4507935689c553a2c91325de2127b4bb8171a601e236"}, + {file = "lightning_utilities-0.11.5-py3-none-any.whl", hash = "sha256:ab2117cc926a9e3757919e25a0da574badb1c0f04fc931849235731b78016a8d"}, + {file = "lightning_utilities-0.11.5.tar.gz", hash = "sha256:a96bee6d8b3df18b7c1a8dec83b2adb03dca6ca0ce3ae9fd355eb0922c4e5e07"}, ] [package.dependencies] @@ -2179,7 +2179,7 @@ typing = ["mypy (>=1.0.0)", "types-setuptools"] [[package]] name = "lightrag" -version = "0.0.0-beta.1" +version = "0.1.0-beta.1" description = "The Lightning Library for LLM Applications." optional = false python-versions = ">=3.9, <4.0" @@ -2196,6 +2196,17 @@ pyyaml = "^6.0.1" tiktoken = "^0.7.0" tqdm = "^4.66.4" +[package.extras] +anthropic = ["anthropic (>=0.31.1,<0.32.0)"] +cohere = ["cohere (>=5.5.8,<6.0.0)"] +faiss-cpu = ["faiss-cpu (>=1.8.0,<2.0.0)"] +google-generativeai = ["google-generativeai (>=0.7.2,<0.8.0)"] +groq = ["groq (>=0.5.0,<0.6.0)"] +openai = ["openai (>=1.12.0,<2.0.0)"] +pgvector = ["pgvector (>=0.3.1,<0.4.0)"] +sqlalchemy = ["sqlalchemy (>=2.0.30,<3.0.0)"] +torch = ["torch (>=2.3.1,<3.0.0)"] + [package.source] type = "directory" url = "lightrag" @@ -4058,26 +4069,26 @@ files = [ [[package]] name = "sphinx" -version = "7.3.7" +version = "7.4.4" description = "Python documentation generator" optional = false python-versions = ">=3.9" files = [ - {file = "sphinx-7.3.7-py3-none-any.whl", hash = "sha256:413f75440be4cacf328f580b4274ada4565fb2187d696a84970c23f77b64d8c3"}, - {file = "sphinx-7.3.7.tar.gz", hash = "sha256:a4a7db75ed37531c05002d56ed6948d4c42f473a36f46e1382b0bd76ca9627bc"}, + {file = "sphinx-7.4.4-py3-none-any.whl", hash = "sha256:0b800d06701329cba601a40ab8c3d5afd8f7e3518f688dda61fd670effc327d2"}, + {file = "sphinx-7.4.4.tar.gz", hash = "sha256:43c911f997a4530b6cffd4ff8d5516591f6c60d178591f4406f0dd02282e3f64"}, ] [package.dependencies] alabaster = ">=0.7.14,<0.8.0" -babel = ">=2.9" -colorama = {version = ">=0.4.5", markers = "sys_platform == \"win32\""} -docutils = ">=0.18.1,<0.22" +babel = ">=2.13" +colorama = {version = ">=0.4.6", markers = "sys_platform == \"win32\""} +docutils = ">=0.20,<0.22" imagesize = ">=1.3" -Jinja2 = ">=3.0" -packaging = ">=21.0" -Pygments = ">=2.14" -requests = ">=2.25.0" -snowballstemmer = ">=2.0" +Jinja2 = ">=3.1" +packaging = ">=23.0" +Pygments = ">=2.17" +requests = ">=2.30.0" +snowballstemmer = ">=2.2" sphinxcontrib-applehelp = "*" sphinxcontrib-devhelp = "*" sphinxcontrib-htmlhelp = ">=2.0.0" @@ -4087,8 +4098,8 @@ sphinxcontrib-serializinghtml = ">=1.1.9" [package.extras] docs = ["sphinxcontrib-websupport"] -lint = ["flake8 (>=3.5.0)", "importlib_metadata", "mypy (==1.9.0)", "pytest (>=6.0)", "ruff (==0.3.7)", "sphinx-lint", "tomli", "types-docutils", "types-requests"] -test = ["cython (>=3.0)", "defusedxml (>=0.7.1)", "pytest (>=6.0)", "setuptools (>=67.0)"] +lint = ["flake8 (>=6.0)", "importlib-metadata (>=6.0)", "mypy (==1.10.1)", "pytest (>=6.0)", "ruff (==0.5.2)", "sphinx-lint (>=0.9)", "tomli (>=2)", "types-docutils (==0.21.0.20240711)", "types-requests (>=2.30.0)"] +test = ["cython (>=3.0)", "defusedxml (>=0.7.1)", "pytest (>=8.0)", "setuptools (>=70.0)", "typing_extensions (>=4.9)"] [[package]] name = "sphinx-copybutton" diff --git a/tutorials/.gitignore b/tutorials/.gitignore index 9c7ad1fb..39a72bfd 100644 --- a/tutorials/.gitignore +++ b/tutorials/.gitignore @@ -2,3 +2,4 @@ *logs test* *.pkl +.storage/ diff --git a/tutorials/rag.ipynb b/tutorials/rag.ipynb index 968673fa..19296226 100644 --- a/tutorials/rag.ipynb +++ b/tutorials/rag.ipynb @@ -9,18 +9,6 @@ "espeically the data processing pipeline to help us with experiments where we will see how useful they are in benchmarking." ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# data_processing config\n", - "\n", - "config = {\n", - " " - ] - }, { "cell_type": "code", "execution_count": 1, @@ -30,7 +18,7 @@ "# the data pipeline and the backend data processing\n", "from lightrag.core.embedder import Embedder \n", "from lightrag.core.types import ModelClientType\n", - "from lightrag.components.data_process import DocumentSplitter, ToEmbeddings\n", + "from lightrag.components.data_process import TextSplitter, ToEmbeddings\n", "from lightrag.core.component import Sequential\n", "\n", "def prepare_data_pipeline():\n", @@ -46,7 +34,7 @@ " \"split_overlap\": 10\n", " }\n", "\n", - " splitter = DocumentSplitter(**splitter_config)\n", + " splitter = TextSplitter(**splitter_config)\n", " embedder = Embedder(model_client =ModelClientType.OPENAI(), model_kwargs=model_kwargs)\n", " embedder_transformer = ToEmbeddings(embedder, batch_size=2)\n", " data_transformer = Sequential(splitter, embedder_transformer)\n", diff --git a/use_cases/.gitignore b/use_cases/.gitignore index f73806eb..d7674511 100644 --- a/use_cases/.gitignore +++ b/use_cases/.gitignore @@ -1 +1,2 @@ -.ipynb_checkpoints/ \ No newline at end of file +.ipynb_checkpoints/ +../index.faiss diff --git a/use_cases/rag.py b/use_cases/rag_yaml_config.py similarity index 100% rename from use_cases/rag.py rename to use_cases/rag_yaml_config.py diff --git a/use_cases/simple_rag.py b/use_cases/simple_rag.py index e45eba17..499f6702 100644 --- a/use_cases/simple_rag.py +++ b/use_cases/simple_rag.py @@ -1,87 +1,74 @@ from typing import Any, List, Optional - - -from lightrag.core.generator import Generator -from lightrag.core.embedder import Embedder - -from lightrag.core.types import Document +import os +from lightrag.core import Component, Generator, Embedder, Sequential +from lightrag.core.types import Document, ModelClientType from lightrag.core.string_parser import JsonParser -from lightrag.core.component import Component, Sequential from lightrag.core.db import LocalDB +from lightrag.utils import setup_env -from lightrag.components.retriever import FAISSRetriever +from lightrag.components.retriever.faiss_retriever import FAISSRetriever from lightrag.components.model_client import OpenAIClient from lightrag.components.data_process import ( RetrieverOutputToContextStr, ToEmbeddings, - DocumentSplitter, + TextSplitter, ) - +setup_env() # TODO: RAG can potentially be a component itsefl and be provided to the users -class RAG(Component): - def __init__(self): - super().__init__() +configs = { + "embedder": { + "batch_size": 100, + "model_kwargs": { + "model": "text-embedding-3-small", + "dimensions": 256, + "encoding_format": "float", + }, + }, + "retriever": { + "top_k": 2, + }, + "generator": { + "model": "gpt-3.5-turbo", + "temperature": 0.3, + "stream": False, + }, + "text_splitter": { + "split_by": "word", + "chunk_size": 400, + "chunk_overlap": 200, + }, +} + + +# use data process complete that will transform on Document structure +def prepare_data_pipeline(): + splitter = TextSplitter(**configs["text_splitter"]) + embedder = Embedder( + model_client=ModelClientType.OPENAI(), + model_kwargs=configs["embedder"]["model_kwargs"], + ) + embedder_transformer = ToEmbeddings( + embedder=embedder, batch_size=configs["embedder"]["batch_size"] + ) + data_transformer = Sequential(splitter, embedder_transformer) + return data_transformer - self.vectorizer_settings = { - "batch_size": 100, - "model_kwargs": { - "model": "text-embedding-3-small", - "dimensions": 256, - "encoding_format": "float", - }, - } - self.retriever_settings = { - "top_k": 2, - } - self.generator_model_kwargs = { - "model": "gpt-3.5-turbo", - "temperature": 0.3, - "stream": False, - } - self.text_splitter_settings = { # TODO: change it to direct to spliter kwargs - "split_by": "word", - "chunk_size": 400, - "chunk_overlap": 200, - } - vectorizer = Embedder( - model_client=OpenAIClient(), - # batch_size=self.vectorizer_settings["batch_size"], #TODO: where to put the batch size control and how big can it go? - model_kwargs=self.vectorizer_settings["model_kwargs"], - # output_processors=ToEmbedderResponse(), - ) - # TODO: check document splitter, how to process the parent and order of the chunks - text_splitter = DocumentSplitter( - split_by=self.text_splitter_settings["split_by"], - split_length=self.text_splitter_settings["chunk_size"], - split_overlap=self.text_splitter_settings["chunk_overlap"], - ) - self.data_transformer = Sequential( - text_splitter, - ToEmbeddings( - embedder=vectorizer, - batch_size=self.vectorizer_settings["batch_size"], - ), - ) - # TODO: make a new key - self.data_transformer_key = self.data_transformer._get_name() - # initialize retriever, which depends on the vectorizer too - self.retriever = FAISSRetriever( - top_k=self.retriever_settings["top_k"], - dimensions=self.vectorizer_settings["model_kwargs"]["dimensions"], - vectorizer=vectorizer, - ) - self.retriever_output_processors = RetrieverOutputToContextStr(deduplicate=True) - # TODO: currently retriever will be applied on transformed data. but its not very obvious design pattern - self.db = LocalDB() +def prepare_database_with_index(docs: List[Document], index_path: str = "index.faiss"): + if os.path.exists(index_path): + return None + db = LocalDB() + db.load(docs) + data_transformer = prepare_data_pipeline() + db.transform(data_transformer, key="data_transformer") + # store + db.save_state(index_path) - # initialize generator - self.generator = Generator( - preset_prompt_kwargs={ - "task_desc_str": r""" + +rag_prompt_task_desc = r""" You are a helpful assistant. Your task is to answer the query that may or may not come with context information. @@ -91,23 +78,44 @@ def __init__(self): { "answer": "The answer to the query", }""" + + +class RAG(Component): + + def __init__(self, index_path: str = "index.faiss"): + super().__init__() + + self.db = LocalDB.load_state(index_path) + + self.transformed_docs: List[Document] = self.db.get_transformed_data( + "data_transformer" + ) + embedder = Embedder( + model_client=ModelClientType.OPENAI(), + model_kwargs=configs["embedder"]["model_kwargs"], + ) + # map the documents to embeddings + self.retriever = FAISSRetriever( + **configs["retriever"], + embedder=embedder, + documents=self.transformed_docs, + document_map_func=lambda doc: doc.vector, + ) + self.retriever_output_processors = RetrieverOutputToContextStr(deduplicate=True) + self.generator = Generator( + model_client=ModelClientType.OPENAI(), + model_kwargs=configs["generator"], + output_processors=JsonParser(), + ) + + self.generator = Generator( + prompt_kwargs={ + "task_desc_str": rag_prompt_task_desc, }, model_client=OpenAIClient(), - model_kwargs=self.generator_model_kwargs, + model_kwargs=configs["generator"], output_processors=JsonParser(), ) - self.tracking = { - "vectorizer": {"num_calls": 0, "num_tokens": 0} - } # TODO: tracking of the usage can be added in default in APIClient component - - def build_index(self, documents: List[Document]): - self.db.load_documents(documents) - self.map_key = self.db.map_data() - print(f"map_key: {self.map_key}") - self.data_key = self.db.transform_data(self.data_transformer) - print(f"data_key: {self.data_key}") - self.transformed_documents = self.db.get_transformed_data(self.data_key) - self.retriever.build_index_from_documents(self.transformed_documents) def generate(self, query: str, context: Optional[str] = None) -> Any: if not self.generator: @@ -125,18 +133,19 @@ def call(self, query: str) -> Any: # fill in the document for i, retriever_output in enumerate(retrieved_documents): retrieved_documents[i].documents = [ - self.transformed_documents[doc_index] - for doc_index in retriever_output.doc_indexes + self.transformed_docs[doc_index] + for doc_index in retriever_output.doc_indices ] - # convert all the documents to context string + print(f"retrieved_documents: \n {retrieved_documents}") context_str = self.retriever_output_processors(retrieved_documents) + print(f"context_str: \n {context_str}") + return self.generate(query, context=context_str) if __name__ == "__main__": - # NOTE: for the ouput of this following code, check text_lightrag.txt doc1 = Document( meta_data={"title": "Li Yin's profile"}, text="My name is Li Yin, I love rock climbing" + "lots of nonsense text" * 500, @@ -145,19 +154,16 @@ def call(self, query: str) -> Any: doc2 = Document( meta_data={"title": "Interviewing Li Yin"}, text="lots of more nonsense text" * 250 - + "Li Yin is a software developer and AI researcher" + + "Li Yin is an AI researcher and a software engineer" + "lots of more nonsense text" * 250, id="doc2", ) - rag = RAG() + # only run it once to prepare the data, if index exists, it will not run + prepare_database_with_index([doc1, doc2], index_path="index.faiss") + rag = RAG(index_path="index.faiss") print(rag) - rag.build_index([doc1, doc2]) - print(rag.tracking) query = "What is Li Yin's hobby and profession?" response = rag.call(query) - # print(f"execution graph: {rag._execution_graph}") print(f"response: {response}") - # print(f"subcomponents: {rag._components}") - # rag.visualize_graph_html("my_component_graph.html") From dce8572e6b7a176cd25b7e51d28c542b08c3e2d4 Mon Sep 17 00:00:00 2001 From: Li Yin Date: Mon, 15 Jul 2024 16:14:23 -0700 Subject: [PATCH 11/12] rename simple_rag to rag --- use_cases/{simple_rag.py => rag.py} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename use_cases/{simple_rag.py => rag.py} (100%) diff --git a/use_cases/simple_rag.py b/use_cases/rag.py similarity index 100% rename from use_cases/simple_rag.py rename to use_cases/rag.py From dd3ffc722f5ae89bc85138faa3de4c1436e116ab Mon Sep 17 00:00:00 2001 From: Li Yin Date: Mon, 15 Jul 2024 16:22:13 -0700 Subject: [PATCH 12/12] add the colab quick start in the documents intro page --- docs/source/index.rst | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 8439d13a..419e1a94 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -8,13 +8,23 @@ .. raw:: html +

+ + + + Try Quickstart in Colab + +

+