RAG using OpenAI, Pinecone and Langchain - PDF Embedding and Querying - 2025

This project demonstrates how to embed PDF documents using OpenAI embeddings, store the embeddings in Pinecone, and query the documents to extract relevant information. It uses LangChain, Pinecone, and OpenAI's GPT models.

Features

Load PDF documents and process them into smaller chunks.
Embed the chunks using OpenAI's text-embedding-ada-002 model.
Store embeddings in Pinecone for efficient similarity search.
Query the stored embeddings to retrieve relevant information.

Prerequisites

Before running this project, ensure you have the following:

Python 3.11 or later.
An OpenAI API key.
A Pinecone API key and an active Pinecone account.
The required Python packages installed (see below).

Installation

Clone this repository:

git clone <repository_url>
cd <repository_name>

Install the required Python packages:

pip install langchain openai pinecone-client

Set up environment variables:
- OPENAI_API_KEY: Your OpenAI API key.
- PINECONE_API_KEY: Your Pinecone API key.
You can create a .env file in the project directory:
```
OPENAI_API_KEY=your_openai_api_key
PINECONE_API_KEY=your_pinecone_api_key
```

Usage

Step 1: Embed PDF and Store in Pinecone

See code in LoadPDF.py


### Step 2: Query the Stored Embeddings
See code in talktopdf.py

---

## Notes
- Use `text-embedding-ada-002` for cost-effective embeddings.
- Ensure the Pinecone index (`langchain1`) exists before querying.
- Check your Pinecone dashboard to monitor index usage.

---

## License
This project is licensed under the MIT License.

--

## Acknowledgments
- [LangChain Documentation](https://docs.langchain.com/)
- [OpenAI API Documentation](https://platform.openai.com/docs/)
- [Pinecone Documentation](https://www.pinecone.io/docs/)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LoadPDF.py		LoadPDF.py
README.md		README.md
TalktoPDF.py		TalktoPDF.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG using OpenAI, Pinecone and Langchain - PDF Embedding and Querying - 2025

Features

Prerequisites

Installation

Usage

Step 1: Embed PDF and Store in Pinecone

About

Releases

Packages

Languages

vijay-0001/Talk2PDF

Folders and files

Latest commit

History

Repository files navigation

RAG using OpenAI, Pinecone and Langchain - PDF Embedding and Querying - 2025

Features

Prerequisites

Installation

Usage

Step 1: Embed PDF and Store in Pinecone

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages