[Feature] AnythingLLM use locally hosted Llama.cpp and GGUF files for…

… inferencing (#413) * Implement use of native embedder (all-Mini-L6-v2) stop showing prisma queries during dev * Add native embedder as an available embedder selection * wrap model loader in try/catch * print progress on download * add built-in LLM support (expiermental) * Update to progress output for embedder * move embedder selection options to component * saftey checks for modelfile * update ref * Hide selection when on hosted subdomain * update documentation hide localLlama when on hosted * saftey checks for storage of models * update dockerfile to pre-build Llama.cpp bindings * update lockfile * add langchain doc comment * remove extraneous --no-metal option * Show data handling for private LLM * persist model in memory for N+1 chats * update import update dev comment on token model size * update primary README * chore: more readme updates and remove screenshots - too much to maintain, just use the app! * remove screeshot link
Mintplex-Labs · Dec 7, 2023 · 655ebd9 · 655ebd9
1 parent fecfb0f
commit 655ebd9
Show file tree

Hide file tree

Showing 22 changed files with 1,304 additions and 99 deletions.
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 </p>
 
 <p align="center">
-    <b>AnythingLLM: A document chatbot to chat with <i>anything!</i></b>. <br />
+    <b>AnythingLLM: A private ChatGPT to chat with <i>anything!</i></b>. <br />
     An efficient, customizable, and open-source enterprise-ready document chatbot solution.
 </p>
 
@@ -22,28 +22,26 @@
   </a>
 </p>
 
-A full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use.
+A full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use as well as supporting multi-user management and permissions.
 
 ![Chatting](/images/screenshots/chatting.gif)
-[view more screenshots](/images/screenshots/SCREENSHOTS.md)
 
 ### Watch the demo!
 
 [![Watch the video](/images/youtube.png)](https://youtu.be/f95rGD9trL0)
 
 
 ### Product Overview
-AnythingLLM aims to be a full-stack application where you can use commercial off-the-shelf LLMs or popular open source LLMs and vectorDB solutions.
-
-Anything LLM is a full-stack product that you can run locally as well as host remotely and be able to chat intelligently with any documents you provide it.
+AnythingLLM is a full-stack application where you can use commercial off-the-shelf LLMs or popular open source LLMs and vectorDB solutions to build a private ChatGPT with no compromises that you can run locally as well as host remotely and be able to chat intelligently with any documents you provide it.
 
 AnythingLLM divides your documents into objects called `workspaces`. A Workspace functions a lot like a thread, but with the addition of containerization of your documents. Workspaces can share documents, but they do not talk to each other so you can keep your context for each workspace clean.
 
 Some cool features of AnythingLLM
 - **Multi-user instance support and permissioning**
-- Atomically manage documents in your vector database from a simple UI
+- Multiple document type support (PDF, TXT, DOCX, etc)
+- Manage documents in your vector database from a simple UI
 - Two chat modes `conversation` and `query`. Conversation retains previous questions and amendments. Query is simple QA against your documents
-- Each chat response contains a citation that is linked to the original document source
+- In-chat citations linked to the original document source and text
 - Simple technology stack for fast iteration
 - 100% Cloud deployment ready.
 - "Bring your own LLM" model.
@@ -52,6 +50,7 @@ Some cool features of AnythingLLM
 
 ### Supported LLMs, Embedders, and Vector Databases
 **Supported LLMs:**
+- [Any open-source llama.cpp compatible model](/server/storage/models/README.md#text-generation-llm-selection)
 - [OpenAI](https://openai.com)
 - [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service)
 - [Anthropic ClaudeV2](https://www.anthropic.com/)
@@ -80,13 +79,18 @@ This monorepo consists of three main sections:
 - `server`: A nodeJS + express server to handle all the interactions and do all the vectorDB management and LLM interactions.
 - `docker`: Docker instructions and build process + information for building from source.
 
-### Requirements
+### Minimum Requirements
+> [!TIP]
+> Running AnythingLLM on AWS/GCP/Azure? 
+> You should aim for at least 2GB of RAM. Disk storage is proprotional to however much data
+> you will be storing (documents, vectors, models, etc). Minimum 10GB recommended.
+
 - `yarn` and `node` on your machine
 - `python` 3.9+ for running scripts in `collector/`.
 - access to an LLM running locally or remotely.
-- (optional) a vector database like Pinecone, qDrant, Weaviate, or Chroma*.
 
 *AnythingLLM by default uses a built-in vector database powered by [LanceDB](https://github.com/lancedb/lancedb)
+
 *AnythingLLM by default embeds text on instance privately [Learn More](/server/storage/models/README.md)
 
 ## Recommended usage with Docker (easy!)
@@ -107,8 +111,8 @@ docker run -d -p 3001:3001 \
 mintplexlabs/anythingllm:master
 ```
 
-Go to `http://localhost:3001` and you are now using AnythingLLM! All your data and progress will persist between
-container rebuilds or pulls from Docker Hub.
+Open [http://localhost:3001](http://localhost:3001) and you are now using AnythingLLM! 
+All your data and progress will now persist between container rebuilds or pulls from Docker Hub.
 
 [Learn more about running AnythingLLM with Docker](./docker/HOW_TO_USE_DOCKER.md)
 

diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -13,7 +13,7 @@ RUN DEBIAN_FRONTEND=noninteractive apt-get update && \
         libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libx11-6 libx11-xcb1 libxcb1 \
         libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 \
         libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release \
-        xdg-utils && \
+        xdg-utils git build-essential && \
     mkdir -p /etc/apt/keyrings && \
     curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg && \
     echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_18.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list && \
@@ -60,6 +60,11 @@ RUN cd ./server/ && yarn install --production && yarn cache clean && \
     rm /app/server/node_modules/vectordb/x86_64-apple-darwin.node && \
     rm /app/server/node_modules/vectordb/aarch64-apple-darwin.node
 
+# Compile Llama.cpp bindings for node-llama-cpp for this operating system.
+USER root
+RUN cd ./server && npx --no node-llama-cpp download
+USER anythingllm
+
 # Build the frontend
 FROM frontend-deps as build-stage
 COPY ./frontend/ ./frontend/

diff --git a/frontend/src/components/LLMSelection/NativeLLMOptions/index.jsx b/frontend/src/components/LLMSelection/NativeLLMOptions/index.jsx
@@ -0,0 +1,84 @@
+import { useEffect, useState } from "react";
+import { Flask } from "@phosphor-icons/react";
+import System from "@/models/system";
+
+export default function NativeLLMOptions({ settings }) {
+  return (
+    <div className="w-full flex flex-col gap-y-4">
+      <div className="flex flex-col md:flex-row md:items-center gap-x-2 text-white mb-4 bg-orange-800/30 w-fit rounded-lg px-4 py-2">
+        <div className="gap-x-2 flex items-center">
+          <Flask size={18} />
+          <p className="text-sm md:text-base">
+            Using a locally hosted LLM is experimental. Use with caution.
+          </p>
+        </div>
+      </div>
+      <div className="w-full flex items-center gap-4">
+        <NativeModelSelection settings={settings} />
+      </div>
+    </div>
+  );
+}
+
+function NativeModelSelection({ settings }) {
+  const [customModels, setCustomModels] = useState([]);
+  const [loading, setLoading] = useState(true);
+
+  useEffect(() => {
+    async function findCustomModels() {
+      setLoading(true);
+      const { models } = await System.customModels("native-llm", null, null);
+      setCustomModels(models || []);
+      setLoading(false);
+    }
+    findCustomModels();
+  }, []);
+
+  if (loading || customModels.length == 0) {
+    return (
+      <div className="flex flex-col w-60">
+        <label className="text-white text-sm font-semibold block mb-4">
+          Model Selection
+        </label>
+        <select
+          name="NativeLLMModelPref"
+          disabled={true}
+          className="bg-zinc-900 border border-gray-500 text-white text-sm rounded-lg block w-full p-2.5"
+        >
+          <option disabled={true} selected={true}>
+            -- waiting for models --
+          </option>
+        </select>
+      </div>
+    );
+  }
+
+  return (
+    <div className="flex flex-col w-60">
+      <label className="text-white text-sm font-semibold block mb-4">
+        Model Selection
+      </label>
+      <select
+        name="NativeLLMModelPref"
+        required={true}
+        className="bg-zinc-900 border border-gray-500 text-white text-sm rounded-lg block w-full p-2.5"
+      >
+        {customModels.length > 0 && (
+          <optgroup label="Your loaded models">
+            {customModels.map((model) => {
+              return (
+                <option
+                  key={model.id}
+                  value={model.id}
+                  selected={settings.NativeLLMModelPref === model.id}
+                >
+                  {model.id}
+                </option>
+              );
+            })}
+          </optgroup>
+        )}
+      </select>
+    </div>
+  );
+}
diff --git a/frontend/src/components/PrivateRoute/index.jsx b/frontend/src/components/PrivateRoute/index.jsx
@@ -21,9 +21,8 @@ function useIsAuthenticated() {
       const {
         MultiUserMode,
         RequiresAuth,
-        OpenAiKey = false,
-        AnthropicApiKey = false,
-        AzureOpenAiKey = false,
+        LLMProvider = null,
+        VectorDB = null,
       } = await System.keys();
 
       setMultiUserMode(MultiUserMode);
@@ -32,9 +31,8 @@ function useIsAuthenticated() {
       if (
         !MultiUserMode &&
         !RequiresAuth && // Not in Multi-user AND no password set.
-        !OpenAiKey &&
-        !AnthropicApiKey &&
-        !AzureOpenAiKey // AND no LLM API Key set at all.
+        !LLMProvider &&
+        !VectorDB
       ) {
         setShouldRedirectToOnboarding(true);
         setIsAuthed(true);

diff --git a/frontend/src/pages/GeneralSettings/LLMPreference/index.jsx b/frontend/src/pages/GeneralSettings/LLMPreference/index.jsx
@@ -3,6 +3,7 @@ import Sidebar, { SidebarMobileHeader } from "@/components/SettingsSidebar";
 import { isMobile } from "react-device-detect";
 import System from "@/models/system";
 import showToast from "@/utils/toast";
+import AnythingLLMIcon from "@/media/logo/anything-llm-icon.png";
 import OpenAiLogo from "@/media/llmprovider/openai.png";
 import AzureOpenAiLogo from "@/media/llmprovider/azure.png";
 import AnthropicLogo from "@/media/llmprovider/anthropic.png";
@@ -15,6 +16,7 @@ import AzureAiOptions from "@/components/LLMSelection/AzureAiOptions";
 import AnthropicAiOptions from "@/components/LLMSelection/AnthropicAiOptions";
 import LMStudioOptions from "@/components/LLMSelection/LMStudioOptions";
 import LocalAiOptions from "@/components/LLMSelection/LocalAiOptions";
+import NativeLLMOptions from "@/components/LLMSelection/NativeLLMOptions";
 
 export default function GeneralLLMPreference() {
   const [saving, setSaving] = useState(false);
@@ -150,6 +152,16 @@ export default function GeneralLLMPreference() {
                   image={LocalAiLogo}
                   onClick={updateLLMChoice}
                 />
+                {!window.location.hostname.includes("useanything.com") && (
+                  <LLMProviderOption
+                    name="Custom Llama Model"
+                    value="native"
+                    description="Use a downloaded custom Llama model for chatting on this AnythingLLM instance."
+                    checked={llmChoice === "native"}
+                    image={AnythingLLMIcon}
+                    onClick={updateLLMChoice}
+                  />
+                )}
               </div>
               <div className="mt-10 flex flex-wrap gap-4 max-w-[800px]">
                 {llmChoice === "openai" && (
@@ -167,6 +179,9 @@ export default function GeneralLLMPreference() {
                 {llmChoice === "localai" && (
                   <LocalAiOptions settings={settings} showAlert={true} />
                 )}
+                {llmChoice === "native" && (
+                  <NativeLLMOptions settings={settings} />
+                )}
               </div>
             </div>
           </form>

diff --git a/frontend/src/pages/OnboardingFlow/OnboardingModal/Steps/DataHandling/index.jsx b/frontend/src/pages/OnboardingFlow/OnboardingModal/Steps/DataHandling/index.jsx
@@ -52,6 +52,13 @@ const LLM_SELECTION_PRIVACY = {
     ],
     logo: LocalAiLogo,
   },
+  native: {
+    name: "Custom Llama Model",
+    description: [
+      "Your model and chats are only accessible on this AnythingLLM instance",
+    ],
+    logo: AnythingLLMIcon,
+  },
 };
 
 const VECTOR_DB_PRIVACY = {

diff --git a/frontend/src/pages/OnboardingFlow/OnboardingModal/Steps/LLMSelection/index.jsx b/frontend/src/pages/OnboardingFlow/OnboardingModal/Steps/LLMSelection/index.jsx
@@ -1,4 +1,5 @@
 import React, { memo, useEffect, useState } from "react";
+import AnythingLLMIcon from "@/media/logo/anything-llm-icon.png";
 import OpenAiLogo from "@/media/llmprovider/openai.png";
 import AzureOpenAiLogo from "@/media/llmprovider/azure.png";
 import AnthropicLogo from "@/media/llmprovider/anthropic.png";
@@ -12,6 +13,7 @@ import AzureAiOptions from "@/components/LLMSelection/AzureAiOptions";
 import AnthropicAiOptions from "@/components/LLMSelection/AnthropicAiOptions";
 import LMStudioOptions from "@/components/LLMSelection/LMStudioOptions";
 import LocalAiOptions from "@/components/LLMSelection/LocalAiOptions";
+import NativeLLMOptions from "@/components/LLMSelection/NativeLLMOptions";
 
 function LLMSelection({ nextStep, prevStep, currentStep }) {
   const [llmChoice, setLLMChoice] = useState("openai");
@@ -110,6 +112,14 @@ function LLMSelection({ nextStep, prevStep, currentStep }) {
               image={LocalAiLogo}
               onClick={updateLLMChoice}
             />
+            <LLMProviderOption
+              name="Custom Llama Model"
+              value="native"
+              description="Use a downloaded custom Llama model for chatting on this AnythingLLM instance."
+              checked={llmChoice === "native"}
+              image={AnythingLLMIcon}
+              onClick={updateLLMChoice}
+            />
           </div>
           <div className="mt-4 flex flex-wrap gap-4 max-w-[752px]">
             {llmChoice === "openai" && <OpenAiOptions settings={settings} />}
@@ -121,6 +131,7 @@ function LLMSelection({ nextStep, prevStep, currentStep }) {
               <LMStudioOptions settings={settings} />
             )}
             {llmChoice === "localai" && <LocalAiOptions settings={settings} />}
+            {llmChoice === "native" && <NativeLLMOptions settings={settings} />}
           </div>
         </div>
         <div className="flex w-full justify-between items-center px-6 py-4 space-x-2 border-t rounded-b border-gray-500/50">

diff --git a/images/screenshots/SCREENSHOTS.md b/images/screenshots/SCREENSHOTS.md
diff --git a/images/screenshots/document.png b/images/screenshots/document.png
diff --git a/images/screenshots/home.png b/images/screenshots/home.png
diff --git a/images/screenshots/llm_selection.png b/images/screenshots/llm_selection.png
diff --git a/images/screenshots/uploading_doc.gif b/images/screenshots/uploading_doc.gif
diff --git a/images/screenshots/vector_databases.png b/images/screenshots/vector_databases.png
diff --git a/server/models/systemSettings.js b/server/models/systemSettings.js
@@ -14,8 +14,8 @@ const SystemSettings = {
     "telemetry_id",
   ],
   currentSettings: async function () {
-    const llmProvider = process.env.LLM_PROVIDER || "openai";
-    const vectorDB = process.env.VECTOR_DB || "lancedb";
+    const llmProvider = process.env.LLM_PROVIDER;
+    const vectorDB = process.env.VECTOR_DB;
     return {
       RequiresAuth: !!process.env.AUTH_TOKEN,
       AuthToken: !!process.env.AUTH_TOKEN,
@@ -111,6 +111,11 @@ const SystemSettings = {
             AzureOpenAiEmbeddingModelPref: process.env.EMBEDDING_MODEL_PREF,
           }
         : {}),
+      ...(llmProvider === "native"
+        ? {
+            NativeLLMModelPref: process.env.NATIVE_LLM_MODEL_PREF,
+          }
+        : {}),
     };
   },
 

diff --git a/server/package.json b/server/package.json
@@ -41,10 +41,11 @@
     "joi-password-complexity": "^5.2.0",
     "js-tiktoken": "^1.0.7",
     "jsonwebtoken": "^8.5.1",
-    "langchain": "^0.0.90",
+    "langchain": "0.0.201",
     "mime": "^3.0.0",
     "moment": "^2.29.4",
     "multer": "^1.4.5-lts.1",
+    "node-llama-cpp": "^2.8.0",
     "openai": "^3.2.1",
     "pinecone-client": "^1.1.0",
     "posthog-node": "^3.1.1",
@@ -64,4 +65,4 @@
     "nodemon": "^2.0.22",
     "prettier": "^2.4.1"
   }
-}
+}