Merge pull request #61 from SylphAI-Inc/li

Li
SylphAI-Inc · Jun 30, 2024 · 8b7d876 · 8b7d876
2 parents 3ff872f + 745a421
commit 8b7d876
Show file tree

Hide file tree

Showing 6 changed files with 129 additions and 53 deletions.
diff --git a/README.md b/README.md
@@ -26,6 +26,7 @@ class Net(nn.Module):
       x = self.dropout2(x)
       x = self.fc1(x)
       return self.fc2(x)
+```
 
 **LightRAG**
 

diff --git a/docs/source/developer_notes/class_hierarchy.rst b/docs/source/developer_notes/class_hierarchy.rst
@@ -1,6 +1,30 @@
-LightRAG Class Hierarchy
+Class Hierarchy
 =============================
+From the plot of the `LightRAG` library's class hierarchy, we can see the library is well-centered around two base classes: `Component` and `DataClass`, and it has no more than two levels of subclasses.
+This design philosophy results in a library with bare minimum abstraction, providing developers with maximum customizability.
 
 .. raw:: html
 
-    <iframe src="../_static/class_hierarchy.html" width="100%" height="750px"></iframe>
+    <style>
+    .iframe-container {
+        width: 100%;
+        height: 100vh; /* Full height of the viewport */
+        max-height: 1000px; /* Maximum height to ensure it doesn't get too tall on larger screens */
+        overflow: hidden;
+    }
+    .iframe-container iframe {
+        width: 100%;
+        height: 100%;
+        border: none;
+    }
+    @media (max-width: 768px) {
+        .iframe-container {
+            height: 60vh; /* Adjust height for mobile viewports */
+            max-height: none; /* Remove the maximum height constraint for small screens */
+        }
+    }
+    </style>
+
+    <div class="iframe-container">
+        <iframe src="../_static/class_hierarchy.html"></iframe>
+    </div>
diff --git a/docs/source/developer_notes/index.rst b/docs/source/developer_notes/index.rst
@@ -1,13 +1,14 @@
 .. _developer_notes:
 
 
-Developer Notes
+Tutorials
 =============================
 
-*Why and How each part works*
+*Why and How Each Part works*
+
+Learn the `why` and `how-to` (customize and integrate) behind each core part within the `LightRAG` library.
+These are our most important tutorials before you move ahead to build use cases  (LLM applications) end to end.
 
-Learn LightRAG design phisolophy and the `why` and `how-to` (customize and integrate) behind each core part within the LightRAG library.
-This is our tutorials before you move ahead to build use cases  (LLM applications) end to end.
 
 .. raw::
 
@@ -22,24 +23,22 @@ This is our tutorials before you move ahead to build use cases  (LLM application
    :align: center
    :width: 600px
 
-   LLM application is no different from a mode training/eval workflow
+   LLM application is no different from a mode training/evaluation workflow
 
    .. :height: 100px
    .. :width: 200px
 
-LightRAG library focus on providing building blocks for developers to **build** and **optimize** the `task pipeline`.
-We have clear design phisolophy: :doc:`lightrag_design_philosophy`.
-
-
-
-..    :maxdepth: 1
-..    :hidden:
-
-..    lightrag_design_philosophy
 
+The `LightRAG` library focuses on providing building blocks for developers to **build** and **optimize** the task pipeline.
+We have a clear :doc:`lightrag_design_philosophy`, which results in this :doc:`class_hierarchy`.
 
+.. toctree::
+   :maxdepth: 1
+   :caption: Introduction
+   :hidden:
 
-..  llm_intro
+   lightrag_design_philosophy
+   class_hierarchy
 
 
 

diff --git a/docs/source/developer_notes/lightrag_design_philosophy.rst b/docs/source/developer_notes/lightrag_design_philosophy.rst
@@ -1,52 +1,34 @@
-LightRAG Design Philosophy
+Design Philosophy
 ====================================
 
-.. Deep understanding of the LLM workflow
-.. ---------------------------------------
+Right from the begining, `LightRAG` follows three fundamental principles.
 
-LLMs are like `water`, it is all up to users to shape it into different use cases. In `PyTorch`, most likely users do not need to build their
-own ``conv`` or ``linear`` module, or their own ``Adam`` optimizer. Their building blocks can meet > 90% of their user's needs on `building` and 
-`optimizing` (training) their models, leaving less than 10% of users, mostly contributors and researchers to build their own ``Module``, ``Tensor``, 
-``Optimizer``, etc. Libraries like `PyTorch`, `numpy`, `scipy`, `sklearn`, `pandas`, etc. are all doing the heavy lifting on the computation optimization.
-However, for developers to write their own LLM task pipeline, calling apis or using local LLMs to shape the LLMs via prompt into any use case is not a hard feat.
-The hard part is on `evaluating` and `optimizing` their task pipeline.
 
-Optimizing over Building 
+Principle 1: Quality over Quantity
 -----------------------------------------------------------------------
 
- We help users to build the task pipeline, but we want to help on optimizing even more so. 
+ The Quality of core building blocks over the Quantity of integrations.
 
-In fact, building the task pipeline accounts for only **10%** of users' development process, the other **90%** is on optimtizing and iterating.
-The most popular libraries like ``Langchain`` and ``LlamaIndex`` are mainly focusing on `building` the task pipeline, prioritizing integrations and coveraging on different type of tasks, resulting large amounts of classes, each 
-with many layers of class inheritance. With the existing libraries, users get stuck on just following the examples, and it requires more time for them to figure out customization than writing their 
-own code.
+We aim to provide developers with well-designed core building blocks that are  **easy** to understand, **transparent** to debug, and **flexible** enough to customize.
+This goes for the prompt, the model client, the retriever, the optimizer, and the trainer.
 
-How to `build` the task pipeline has starting to mature: `prompt`, `retriever`, `generator`, `RAG`, `Agent` has becoming well-known concepts.
-How to `optimize` the task pipeline is still a mystery to most users. And most are still doing `manual` prompt engineering without good 
-`observability` (or `debugging` ) tools. And these existing `observability` tools are mostly commercialized, prioritizing the `fancy` looks without
-real deep understanding of the LLM workflow.
 
-The existing optimization process of LLM applications are full of frustrations.
 
-Quality over Quantity
+Principle 2: Optimizing over Building
 -----------------------------------------------------------------------
 
- The Quality of core building blocks over the Quantity of integrations.
+ We help users build the task pipeline, but we want to help with optimizing even more so.
+
+
 
-The whole `PyTorch` library is built on a few core and base classes: ``Module``, ``Tensor``, ``Parameter``, and ``Optimizer``, 
-and various ``nn`` modules for users to build a model, along with ``functionals``.
-This maps to ``Component``, ``DataClass``,  ``Parameter``, and ``Optimizer`` in LightRAG, and various subcomponents 
-like ``Generator``, ``Retriever``, ``Prompt``, ``Embedder``, ``ModelClient``, along with ``functionals`` to process string,
-interprect tool from the string.
+We will design our building blocks with `optimization` in mind.
+This means beyond giving developers transparency and control, providing them with great `logging`, `observability`, `configurability`, `optimizers`,  and `trainers`
+to ease the existing frustrations of optimizing the task pipeline.
 
-We recognize developers who are building real-world Large Language Model (LLM) applications are the real heroes, doing the hard
-work. They need well-designed core building blocks:  **easy** to understand, **transparent** to debug, **flexible** enough to customize their own
-``ModelClient``, their own ``Prompt``, their own ``Generator`` and even their own ``Optimizer``, ``Trainer``. The need to build their own component is even more so than using `PyTorch.`
-LightRAG aggressively focus on the quality and clarity of the core building blocks over the quantity of integrations.
 
 
 
-Practicality over Showmanship
+Principle 3: Practicality over Showmanship
 -----------------------------------------------------------------------
 We put these three hard rules while designing LightRAG:
 
@@ -56,7 +38,75 @@ We put these three hard rules while designing LightRAG:
 
 
 
+Our deep understanding of LLM workflow
+-----------------------------------------------------------------------
+
+The above principles are distilled from our deep understanding of the LLM workflow.
+
+
+**Developers are the ultimate heroes**
+
+LLMs are like `water`, they can almost do anything, from GenAI applications such as `chatbot`, `translation`, `summarization`, `code generation`, `autonomous agent` to classical NLP tasks like `text classification`, and `named entity recognition`.
+They interact with the world beyond the model's internal knowledge via `retriever`, `memory`, and `tools` (`function calls`).
+Each use case is unique in its data, its business logic, and its unique user experience.
+
+
+Building LLM applications is a combination of software engineering and modeling (in-context learning).
+Libraries like `PyTorch` mainly provide basic building blocks and do the heavy lifting on computation optimization.
+If 10% of all `PyTorch` users need to customize a layer or an optimizer, the chance of customizing will only be higher for LLM applications.
+Any library aiming to provide out-of-box solutions is destined to fail as it is up to the developers to address each unique challenge.
+
+
+
+**Manual prompt engineering vs Auto-prompt optimization**
+
+Developers rely on prompting to shape the LLMs into their use cases via In-context learning (ICL).
+However, LLM prompting is highly sensitive: the accuracy gap between top-performing and lower-performing prompts can be as high as 40%.
+It is also a brittle process that breaks the moment your model changes.
+Because of this, developers end up spending **10%** of their time building the task pipeline itself, but the other **90%** in optimizing and iterating the prompt.
+The process of closing the accuracy gap between the demo to the production is full of frustrations.
+There is no doubt that the future of LLM applications is in auto-prompt optimization, not manual prompt engineering.
+However, researchers are still trying to understand prompt engineering, the process of automating it is even more in its infancy state.
+
+**Know where the heavy lifting is**
+
+The heavy lifting of an LLM library is not to provide developers out-of-box prompts, not on intergrations of different API providers or data bases, it is on:
+
+- Core base classes and abstractions to help developers on "boring" things like seralization, deserialization, standarizing interfaces, data processing.
+- Building blocks to help LLMs interact with the world.
+- `Evaluating` and `optimizing` the task pipeline.
+
+All while giving full control of the prompt and the task pipeline to the developers.
+
+
+
+
+
+.. raw::
+
+    [Optional] Side story: How `LightRAG` is born
+.. ----------------------------------------------
+
+.. The whole `PyTorch` library is built on a few core and base classes: ``Module``, ``Tensor``, ``Parameter``, and ``Optimizer``,
+.. and various ``nn`` modules for users to build a model, along with ``functionals``.
+.. This maps to ``Component``, ``DataClass``,  ``Parameter``, and ``Optimizer`` in LightRAG, and various subcomponents
+.. like ``Generator``, ``Retriever``, ``Prompt``, ``Embedder``, ``ModelClient``, along with ``functionals`` to process string,
+.. interprect tool from the string.
+
+.. We recognize developers who are building real-world Large Language Model (LLM) applications are the real heroes, doing the hard
+.. work. They need well-designed core building blocks:  **easy** to understand, **transparent** to debug, **flexible** enough to customize their own
+.. ``ModelClient``, their own ``Prompt``, their own ``Generator`` and even their own ``Optimizer``, ``Trainer``. The need to build their own component is even more so than using `PyTorch.`
+.. LightRAG aggressively focus on the quality and clarity of the core building blocks over the quantity of integrations.
+
+.. the current state of the art in auto-prompt optimization is still in its infancy.
+.. Though Auto-prompt optimization is the future, now we are still in the process of understanding more on prompt engineering itself and but it is a good starting point for auto-prompt optimization.
+
+.. The future is at the optimizing.
+.. Using LLMs via apis or local LLMs is easy, so where is the value of having a library like `LightRAG`?
+
+.. In `PyTorch`, most likely users do not need to build their own ``conv`` or ``linear`` module, or their own ``Adam`` optimizer.
+.. The existing building blocks can meet > 90% users' needs, leaving less than 10% of users, mostly contributors and researchers to build their own `Module`, `Tensor`,
+.. `Optimizer`, etc. Excellent libraries like `PyTorch`, `numpy`, `scipy`, `sklearn`, `pandas` are all doing the heavy lifting on the computation optimization.
 
 
-[Optional] Side story: How `LightRAG` is born
-----------------------------------------------
+.. Using LLMs via apis or local LLMs is easy, so where is the heavy lifting in the LLM applications?
diff --git a/docs/source/get_started/index.rst b/docs/source/get_started/index.rst
@@ -8,6 +8,6 @@ Here is the content of our documentation project.
 .. toctree::
    :maxdepth: 2
 
-   lightrag_in_10_mins
    installation
    community
+   lightrag_in_10_mins
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -127,7 +127,9 @@ On the contrary, we have to do 'more' and go 'deeper' and 'wider' on any topic t
 
 .. - LightRAG provides advanced tooling for developers to build `agents`, `tools/function calls`, etc., without relying on any proprietary API provider's 'advanced' features such as `OpenAI` assistant, tools, and JSON format
 
-It is the future of LLM applications
+.. It is the future of LLM applications
+
+Unites both Research and Production
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 On top of the easiness to use, we in particular optimize the configurability of components for researchers to build their solutions and to benchmark existing solutions.