- General fixes and improvements.

- Added sample setup to use GPU
amithkoujalgi · Dec 17, 2023 · e97290f · e97290f
1 parent 70b8530
commit e97290f
Show file tree

Hide file tree

Showing 9 changed files with 159 additions and 73 deletions.
diff --git a/Makefile b/Makefile
@@ -0,0 +1,13 @@
+start:
+	docker-compose -f ./docker-compose.yaml down -v; \
+    docker-compose -f ./docker-compose.yaml rm -fsv; \
+	docker-compose -f ./docker-compose.yaml up --remove-orphans;
+
+start-gpu:
+	docker-compose -f ./docker-compose-gpu.yaml down -v; \
+    docker-compose -f ./docker-compose-gpu.yaml rm -fsv; \
+	docker-compose -f ./docker-compose-gpu.yaml up --remove-orphans;
+
+stop:
+	docker-compose -f ./docker-compose.yaml down -v; \
+    docker-compose -f ./docker-compose.yaml rm -fsv;
diff --git a/README.md b/README.md
@@ -14,72 +14,92 @@ The LLMs are downloaded and served via [Ollama](https://github.com/jmorganca/oll
 - [How to run](#how-to-run)
 - [Demo](#demo)
 - [Improvements](#improvements)
+- Contributing
 - [Credits](#credits)
 
-#### Requirements
+### Requirements
+
+[![][shield]][site]
+
+[![][maketool-shield]][maketool-site]
+
+[site]: https://docs.docker.com/compose/
+
+[shield]: https://img.shields.io/badge/Docker_Compose-Installation-blue.svg?style=for-the-badge&labelColor=gray
+
+[maketool-site]: https://www.gnu.org/software/make/
 
-- Docker (with docker-compose)
-- Python (for development only)
-
-#### How to run
-
-Define a `docker-compose.yml` by adding the following contents into the file.
-
-```yaml
-services:
-
-  ollama:
-    image: ollama/ollama
-    ports:
-      - 11434:11434
-    volumes:
-      - ~/ollama:/root/.ollama
-    networks:
-      - net
-
-  app:
-    image: amithkoujalgi/pdf-bot:1.0.0
-    ports:
-      - 8501:8501
-    environment:
-      - OLLAMA_API_BASE_URL=http://ollama:11434
-      - MODEL=orca-mini
-    networks:
-      - net
-
-networks:
-  net:
+[maketool-shield]: https://img.shields.io/badge/Make-Tool-blue.svg?style=for-the-badge&labelColor=gray
+
+### How to run
+
+#### CPU version
+
+```shell
+make start
 ```
 
-Then run:
+#### GPU version
 
 ```shell
-docker-compose up
+make start-gpu
 ```
 
 When the server is up and running, access the app at: http://localhost:8501
 
-Note: 
+**Note:**
+
 - It takes a while to start up since it downloads the specified model for the first time.
 - If your hardware does not have a GPU and you choose to run only on CPU, expect high response time from the bot.
-- Only Nvidia is supported as mentioned in Ollama's documentation. Others such as AMD isn't supported yet. Read how to use GPU on [Ollama container](https://hub.docker.com/r/ollama/ollama) and [docker-compose](https://docs.docker.com/compose/gpu-support/#:~:text=GPUs%20are%20referenced%20in%20a,capabilities%20.).
+- Only Nvidia is supported as mentioned in Ollama's documentation. Others such as AMD isn't supported yet. Read how to
+  use GPU on [Ollama container](https://hub.docker.com/r/ollama/ollama)
+  and [docker-compose](https://docs.docker.com/compose/gpu-support/#:~:text=GPUs%20are%20referenced%20in%20a,capabilities%20.).
+- Make sure to have Nvidia drivers setup on your execution environment for the best results.
 
 Image on DockerHub: https://hub.docker.com/r/amithkoujalgi/pdf-bot
 
-#### [Demo](https://www.youtube.com/watch?v=jJyFslR-oNQ):
+### [Demo](https://www.youtube.com/watch?v=jJyFslR-oNQ)
 
 https://github.com/amithkoujalgi/ollama-pdf-bot/assets/1876165/40dc70e6-9d35-4171-9ae6-d82247dbaa17
 
-Sample PDFs:
+#### Sample PDFs
 
 [Hl-L2351DW v0522.pdf](https://github.com/amithkoujalgi/ollama-pdf-bot/files/13323209/Hl-L2351DW.v0522.pdf)
 
 [HL-B2080DW v0522.pdf](https://github.com/amithkoujalgi/ollama-pdf-bot/files/13323208/HL-B2080DW.v0522.pdf)
 
-#### Improvements
+### Improvements
 
 - [ ] Expose model params such as `temperature`, `top_k`, `top_p` as configurable env vars
 
-#### Credits
+### Benchmarks
+
+- The above provided PDFs were used for benchmarking.
+
+LLAMA2: Download model - ~6-8 minutes
+
+#### Devices used
+
+- PC: Intel i9 (9th gen), Nvidia RTX 2080, 32 GB memory
+- Laptop: Intel i7 MacBook Pro (2017)
+
+| Model  | Device | Operation                                 | Time Taken       |
+|--------|--------|-------------------------------------------|------------------|
+| LLAMA2 | PC     | Load embedding model                      | ~3-4 minutes     |
+| LLAMA2 | PC     | Answer the questions on the uploaded PDFs | ~5-10 seconds    |
+| LLAMA2 | Laptop | Load embedding model                      | ~8 minutes       |
+| LLAMA2 | Laptop | Answer the questions on the uploaded PDFs | ~100-130 seconds |
+
+### Contributing
+
+Contributions are most welcome! Whether it's reporting a bug, proposing an enhancement, or helping
+with code - any sort of contribution is much appreciated.
+
+#### Requirements
+
+![Python](https://img.shields.io/badge/python-3.8_+-green.svg)
+
+### Credits
 
-Thanks to the incredible [Ollama](https://github.com/jmorganca/ollama), [Langchain](https://www.langchain.com/) and [Streamlit](https://streamlit.io/) projects.
+Thanks to the incredible [Ollama](https://github.com/jmorganca/ollama), [Langchain](https://www.langchain.com/)
+and [Streamlit](https://streamlit.io/) projects.
diff --git a/docker-compose-gpu.yml b/docker-compose-gpu.yml
@@ -0,0 +1,30 @@
+services:
+
+  ollama:
+    image: ollama/ollama:latest
+    ports:
+      - 11434:11434
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [ gpu ]
+    volumes:
+      - ~/ollama:/root/.ollama
+    networks:
+      - net
+
+  app:
+    image: amithkoujalgi/pdf-bot:1.0.0
+    ports:
+      - 8501:8501
+    environment:
+      - OLLAMA_API_BASE_URL=http://ollama:11434
+      - MODEL=llama2
+    networks:
+      - net
+
+networks:
+  net:
diff --git a/docker-compose.yml b/docker-compose.yml
@@ -15,7 +15,7 @@ services:
       - 8501:8501
     environment:
       - OLLAMA_API_BASE_URL=http://ollama:11434
-      - MODEL=orca-mini
+      - MODEL=llama2
     networks:
       - net
 

diff --git a/pdf_bot/app.py b/pdf_bot/app.py
@@ -4,13 +4,16 @@
 
 import streamlit as st
 
-from pdf_helper import PDFHelper
+from config import Config
+from pdf_helper import PDFHelper, load_embedding_model
+
+load_embedding_model(model_name=Config.EMBEDDING_MODEL_NAME)
 
 title = "PDF Bot"
 
-model_name = os.environ.get('MODEL', "orca-mini")
+model_name = Config.MODEL
 
-ollama_api_base_url = os.environ.get('OLLAMA_API_BASE_URL', "http://localhost:11434")
+ollama_api_base_url = Config.OLLAMA_API_BASE_URL
 pdfs_directory = os.path.join(str(Path.home()), 'langchain-store', 'uploads', 'pdfs')
 os.makedirs(pdfs_directory, exist_ok=True)
 

diff --git a/pdf_bot/config.py b/pdf_bot/config.py
@@ -0,0 +1,9 @@
+import os
+
+
+class Config:
+    MODEL = os.environ.get('MODEL', "llama2")
+    EMBEDDING_MODEL_NAME = os.environ.get('EMBEDDING_MODEL_NAME', "all-MiniLM-L6-v2")
+    OLLAMA_API_BASE_URL = os.environ.get('OLLAMA_API_BASE_URL', "http://localhost:11434")
+    HUGGING_FACE_EMBEDDINGS_DEVICE_TYPE = os.environ.get('HUGGING_FACE_EMBEDDINGS_DEVICE_TYPE',
+                                                         "cpu")
diff --git a/pdf_bot/pdf_helper.py b/pdf_bot/pdf_helper.py
@@ -12,6 +12,8 @@
 from langchain.text_splitter import RecursiveCharacterTextSplitter
 from langchain.vectorstores import FAISS
 
+from config import Config
+
 
 # This loads the PDF file
 def load_pdf_data(file_path):
@@ -42,10 +44,10 @@ def split_docs(documents, chunk_size=1000, chunk_overlap=20):
 
 
 # function for loading the embedding model
-def load_embedding_model(model_path, normalize_embedding=True):
+def load_embedding_model(model_name, normalize_embedding=True):
     return HuggingFaceEmbeddings(
-        model_name=model_path,
-        model_kwargs={'device': 'cpu'},  # here we will run the model with CPU only
+        model_name=model_name,
+        model_kwargs={'device': Config.HUGGING_FACE_EMBEDDINGS_DEVICE_TYPE},  # here we will run the model with CPU only
         encode_kwargs={
             'normalize_embeddings': normalize_embedding  # keep True to compute cosine similarity
         }
@@ -78,19 +80,19 @@ def load_qa_chain(retriever, llm, prompt):
 def get_response(query, chain) -> str:
     # Get response from chain
     response = chain({'query': query})
-
     # Wrap the text for better output in Jupyter Notebook
-    wrapped_text = textwrap.fill(response['result'], width=100)
-    return wrapped_text
+    res = response['result']
+    # wrapped_text = textwrap.fill(res, width=100)
+    return res
 
 
 class PDFHelper:
 
-    def __init__(self, ollama_api_base_url: str, model_name: str = "orca-mini",
-                 embedding_model_path: str = "all-MiniLM-L6-v2"):
+    def __init__(self, ollama_api_base_url: str, model_name: str = Config.MODEL,
+                 embedding_model_name: str = Config.EMBEDDING_MODEL_NAME):
         self._ollama_api_base_url = ollama_api_base_url
         self._model_name = model_name
-        self._embedding_model_path = embedding_model_path
+        self._embedding_model_name = embedding_model_name
 
     def ask(self, pdf_file_path: str, question: str) -> str:
         vector_store_directory = os.path.join(str(Path.home()), 'langchain-store', 'vectorstore',
@@ -116,7 +118,7 @@ def ask(self, pdf_file_path: str, question: str) -> str:
         )
 
         # Load the Embedding Model
-        embed = load_embedding_model(model_path=self._embedding_model_path)
+        embed = load_embedding_model(model_name=self._embedding_model_name)
 
         # load and split the documents
         docs = load_pdf_data(file_path=pdf_file_path)
@@ -130,8 +132,9 @@ def ask(self, pdf_file_path: str, question: str) -> str:
 
         template = """
         ### System:
-        You are an respectful and honest assistant. You have to answer the user's questions using only the context \
-        provided to you. If you don't know the answer, just say you don't know. Don't try to make up an answer.
+        You are an honest assistant.
+        You will accept PDF files and you will answer the question asked by the user appropriately.
+        If you don't know the answer, just say you don't know. Don't try to make up an answer.
     
         ### Context:
         {context}
@@ -151,4 +154,5 @@ def ask(self, pdf_file_path: str, question: str) -> str:
         end_time = time.time()
 
         print(f"Response time: {end_time - start_time} seconds.\n")
+
         return response.strip()
diff --git a/pdf_bot/pull_model.py b/pdf_bot/pull_model.py
@@ -1,22 +1,29 @@
 import json
-import os
-
 import requests
 
-model_name = os.environ.get('MODEL', "orca-mini")
-ollama_api_base_url = os.environ.get('OLLAMA_API_BASE_URL', "http://localhost:11434")
+from config import Config
+
+model_name = Config.MODEL
+ollama_api_base_url = Config.OLLAMA_API_BASE_URL
 print(f"Using model: {model_name}")
 print(f"Using Ollama base URL: {ollama_api_base_url}")
 
 
 def pull_model(model_name_):
-    print(f"pulling model '{model_name_}'...")
+    print(f"Pulling model '{model_name_}'...")
     url = f"{ollama_api_base_url}/api/pull"
     data = json.dumps(dict(name=model_name_))
-    print(data)
     headers = {'Content-Type': 'application/json'}
-    _response = requests.post(url, data=data, headers=headers)
-    print(_response.text)
+
+    # Use stream=True to handle streaming response
+    with requests.post(url, data=data, headers=headers, stream=True) as response:
+        if response.status_code == 200:
+            # Process the response content in chunks
+            for chunk in response.iter_content(chunk_size=1024):
+                if chunk:
+                    print(chunk.decode('utf-8'), end='')  # Replace 'utf-8' with the appropriate encoding
+        else:
+            print(f"Error: {response.status_code} - {response.text}")
 
 
 pull_model(model_name_=model_name)
diff --git a/requirements.txt b/requirements.txt
@@ -1,8 +1,8 @@
-langchain
-streamlit
-replicate
-pymupdf
-huggingface-hub
-faiss-cpu
-sentence-transformers
-requests
+langchain==0.0.334
+streamlit==1.28.1
+replicate==0.18.1
+pymupdf==1.23.6
+huggingface-hub==0.17.3
+faiss-cpu==1.7.4
+sentence-transformers==2.2.2
+requests==2.31.0