Video link: https://drive.google.com/file/d/1L-q7F59f1DgNA-YzworVFT4NaC4HFGJK/view?usp=sharing
This project integrates multiple AI models and a web search API to process images, generate search queries, and retrieve relevant information. It demonstrates how various AI models and APIs can be combined to create a system that understands visual content, interprets user queries, and fetches related information from the internet.
The pipeline begins by using the MiniCPM-V-2 model to generate a caption for a given image. This caption describes the content of the image in a textual format. Following that, the Qwen2.5-1.5B-Instruct model takes both the caption and any additional user input to create a relevant search query. This query is then used to interact with the Brave Search API, which retrieves both text-based and image-based search results from the web. The pipeline is connected by using 2 google colab files as servers through the FastAPI and ngrok. The web app is created using streamlit.
The system is designed to demonstrate how different AI models can work together to understand images, create useful search queries, and retrieve relevant results, making it applicable for various domains such as e-commerce, content discovery, and more complex AI-driven applications.
- Image captioning using MiniCPM-V-2 model
- Query generation based on image captions and user input via Qwen2.5-1.5B-Instruct model
- Text and image search through Brave Search API
- Integration of AI models and external APIs for comprehensive information retrieval
To run this app, you will need:
- A Google account (for accessing Colab)
- Python 3.8+
- Streamlit installed (
pip install streamlit
) ngrok
installed (pip install pyngrok
)- Brave API token
- Huggingface API token
There are two Jupyter notebook files to be run in Colab:
- Image Description MiniCPM.ipynb
- Search Query Qwen.ipynb
- Upload the two notebook files (
Image Description MiniCPM.ipynb
andSearch Query Qwen.ipynb
) to Google Colab. - Open and run both notebooks in Colab. One of the colab notebook requires this huggingface api key:
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
- After execution, each notebook will create a Colab server and return an ngrok URL.
- Copy the ngrok URLs from the Colab outputs.
- Open the
image-query-app.py
file in your local environment. - Paste the two ngrok URLs generated in the previous step into the appropriate placeholders inside
image-query-app.py
.- One URL for the image description service
- Another URL for the search query service
You will need to update the two URLs in your local file based on the URLs generated from each of the Colab notebooks. Update the following variables:
url = "https://<new-ngrok-url-1>.ngrok-free.app//rag_pipeline" # URL from notebook 1
url2 = "https://<new-ngrok-url-2>.ngrok-free.app//web_search" # URL from notebook 2
Replace each <new-ngrok-url-x>
with the ngrok URLs from the respective Colab notebook outputs.
-
Open a terminal or command prompt.
-
Navigate to the directory where the
image-query-app.py
file is located. -
Run the web app using Streamlit:
streamlit run image-query-app.py
-
Your default browser will open with the web app interface.
-
In the browser interface, you will be prompted to enter:
- An image URL
- A search query
- Your Brave API token (you can use the provided token:
BSAic7Uf6UTCyDixegz4eOnERiS-M4P
)
-
After entering the required inputs, press the "Submit" button.
-
The app will connect to the Colab servers, process your request, and return the final output as a markdown file.
That’s it! You now have a fully functional web app for querying images using Brave API and Colab-based models.
https://drive.google.com/file/d/1uRNTHi2F0sX6i5pd6RinDOQyrf1NZ83f/view?usp=sharing
https://drive.google.com/file/d/16ElHYbes6Fl5Y5mk671tFUL8Jc2jhZkH/view?usp=sharing
https://drive.google.com/file/d/1ZlEAy_WU5UxOaRLBvWZv4wVzNe4T7mil/view?usp=sharing
https://drive.google.com/file/d/1tN8hQ6Y-yqi7RuYCuKdynyW-hrUr7PqX/view?usp=sharing
https://drive.google.com/file/d/1DGBq7Qi4aLFn4bCaJfGxnEBFGHDJilPD/view?usp=sharing
This repository contains the implementation of a Retrieval-Augmented Generation (RAG) pipeline that uses a combination of document retrieval and natural language generation to answer user queries based on a given corpus of documents. It is integrated with a FastAPI-based web app for query processing and response generation.
- Project Overview
- How it Works
- Features
- Setup and Installation
- Usage
- Project Structure
- Contributing
- License
This RAG pipeline is designed to answer complex queries using both document retrieval and generative language models. The pipeline classifies queries into different types, retrieves relevant documents from a corpus, and then uses a language model to generate an answer based on the retrieved documents. The pipeline supports three query types:
- Inference Queries
- Temporal Queries
- Comparison Queries
The system retrieves and augments data from the corpus to improve answer quality and accuracy.
The pipeline is built using the following core components:
- Entity Extraction: Extracts key entities such as topics, article names, and actions from the user query.
- Document Retrieval: Based on extracted entities, the pipeline searches the corpus to find relevant documents.
- Query Classification: Classifies the query into one of the three categories:
inference_query
,temporal_query
, orcomparison_query
. - Answer Generation: The documents are passed to Meta-Llama-3.1 for generating the final answer. For some queries, the system breaks down the query into smaller sub-queries, retrieves relevant parts, and then combines them to generate the final answer.
- FastAPI Integration: The pipeline is accessible through a FastAPI web application for easy interaction.
- Multi-step query processing: Queries are classified into different types, and the pipeline adjusts its retrieval and generation strategies accordingly.
- Efficient document retrieval: Documents are retrieved based on keywords, with support for actions, topics, and article names.
- Scalable: Can be integrated with large document corpora for efficient searching and answering.
- Flexible query handling: Breaks down complex queries into smaller parts to provide more accurate answers.
- Web Integration: Accessible through a FastAPI web app for querying and results display.
This project integrates a Retrieval-Augmented Generation (RAG) pipeline using FastAPI, exposed through ngrok. The FastAPI endpoints are deployed from four separate Google Colab notebooks, each providing one public URL. These URLs need to be integrated into your local app.py
file for specific query types.
Make sure you have the required packages installed locally or in your Colab notebook:
pip install pyngrok fastapi uvicorn pydantic
You need to run four separate Colab notebooks, each responsible for a different part of the RAG pipeline. Follow these steps for each notebook:
-
Open the Colab notebook.
-
Execute all cells in the notebook. The notebook will:
- Install necessary dependencies.
- Start a FastAPI app with the corresponding RAG pipeline.
- Generate a public URL using ngrok.
-
After each notebook runs, you will see a unique ngrok URL. For example:
FastAPI is publicly available at: http://<ngrok-generated-url>.ngrok.io
-
Copy the generated ngrok URL for each notebook, which will be used in your local
app.py
file.
You will need to update the four URLs in your local app.py
file based on the URLs generated from each of the four Colab notebooks. Update the following variables:
compurl = "https://<new-ngrok-url-1>.ngrok-free.app/rag-pipeline" # URL from notebook 1
classurl = "https://<new-ngrok-url-2>.ngrok-free.app/rag-pipeline" # URL from notebook 2
temporalurl = "https://<new-ngrok-url-3>.ngrok-free.app/rag_pipeline" # URL from notebook 3
infurl = "https://<new-ngrok-url-4>.ngrok-free.app/rag_pipeline" # URL from notebook 4
Replace each <new-ngrok-url-x>
with the ngrok URLs from the respective Colab notebook outputs.
After updating the URLs, run the app.py
file locally:
python app.py
Your local app will now send requests to the four different FastAPI endpoints hosted via ngrok, each handling a specific query type in the RAG pipeline.
- Run all four Colab notebooks. Each notebook will expose one of the four RAG pipeline endpoints using ngrok.
- Update the URLs in your
app.py
with the ngrok URLs generated by the notebooks. - Run
app.py
to interact with the RAG pipeline endpoints through your local app.
- Ngrok session expiration: Free-tier ngrok sessions expire after some time. If this happens, rerun the corresponding Colab notebook to get a new public URL and update your
app.py
accordingly. - Colab session management: Ensure all four Colab sessions remain active for uninterrupted API access.
This guide should help you manage the integration of multiple Colab notebooks with your local app using ngrok.