Build a RAG System with Pinecone and LangGraph
Introduction
If you want your AI app to answer questions using your own data, building a **RAG system with Pinecone and LangGraph** is the ideal approach. This guide walks you through every step, from setting up your environment to indexing documents and querying your AI, ensuring accurate, context-driven answers.
In this guide, you’ll learn how to build a production-ready RAG pipeline using:
- LangGraph for orchestration
- OpenAI for embeddings + LLM
- Pinecone for vector search
By the end, you’ll have a working system just like the one in your code.
What is a RAG System with Pinecone and LangGraph?
A Retrieval-Augmented Generation (RAG) system works like this:
- User asks a question
- Convert question → embedding
- Search vector database
- Retrieve relevant chunks
- Send context + question to LLM
- Generate an accurate answer
👉 This prevents hallucinations and makes your AI grounded in real data.
RAG Architecture with Pinecone and LangGraph
The pipeline in a RAG system looks like this:
- Query Input
- Embedding (OpenAI, Claude AI, Gemini, etc.)
- Vector Search (Pinecone)
- Context Assembly
- LLM Generation
LangGraph connects all of this into a clean execution flow.
Create a virtual environment
Activate it
Windows (PowerShell):
Mac/Linux:
Select interpreter in VS Code
- Press
Ctrl + Shift + P - Type:
Python: Select Interpreter - Choose your
venv
Step 2 – Initialize Git + Create GitHub Repository
Initialize Git locally
Create .gitignore
.env
__pycache__/
First commit
git commit -m “Initial RAG project setup”
Create GitHub Repository
Go to GitHub and:
- Click New Repository
- Name it:
rag-project - Do NOT initialize with README (you already have files)
Connect local repo to GitHub
git branch -M main
git push -u origin main
Step 3 – Install Dependencies for RAG System with Pinecone and LangGraph
Step 4 – Set Up Environment Variables
Create .env:
PINECONE_API_KEY=your_key
PINECONE_INDEX_NAME=rag-index
PINECONE_ENV=us-east-1
Step 5 – Add PDFs to Pinecone Recursively
File: ingest_docs.py
This file:
- Reads your files from the
data/folder - Breaks them into smaller chunks
- Converts each chunk into embeddings (vectors)
- Stores them in Pinecone
Think of this as:
“Load my knowledge into the AI’s memory.”
<?php
# ============================================
# INGEST SCRIPT (LOAD DATA INTO PINECONE)
# ============================================
import os
import uuid
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone
from pypdf import PdfReader
# Load API keys from .env file
load_dotenv()
# Create OpenAI client (used for embeddings)
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Connect to Pinecone
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
index = pc.Index(os.getenv("PINECONE_INDEX_NAME"))
# Folder where your documents live
DATA_FOLDER = "../data"
# ---------------------------
# READ TEXT FILE
# ---------------------------
def load_txt(path):
# Open and read text file
with open(path, "r", encoding="utf-8") as f:
return f.read()
# ---------------------------
# READ PDF FILE
# ---------------------------
def load_pdf(path):
reader = PdfReader(path)
text = ""
# Loop through each page
for page in reader.pages:
page_text = page.extract_text()
if page_text:
text += page_text + "\n"
return text
# ---------------------------
# SPLIT TEXT INTO CHUNKS
# ---------------------------
def chunk_text(text, chunk_size=500, overlap=100):
chunks = []
start = 0
# Break text into overlapping chunks
while start < len(text):
chunk = text[start:start + chunk_size].strip()
if chunk:
chunks.append(chunk)
# Move forward but keep some overlap
start += chunk_size - overlap
return chunks
# ---------------------------
# CREATE EMBEDDINGS (BATCH)
# ---------------------------
def embed_batch(texts):
# Convert text into vectors
res = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
# Return list of embeddings
return [d.embedding for d in res.data]
# ---------------------------
# PROCESS ONE FILE
# ---------------------------
def process_file(file_path):
print(f"\nProcessing: {file_path}")
# Decide how to load file
if file_path.endswith(".txt"):
text = load_txt(file_path)
elif file_path.endswith(".pdf"):
text = load_pdf(file_path)
else:
print("Skipping unsupported file")
return
# Skip empty files
if not text.strip():
print("Empty file")
return
# Break text into chunks
chunks = chunk_text(text)
print(f"{len(chunks)} chunks created")
# Convert chunks into embeddings
embeddings = embed_batch(chunks)
vectors = []
# Prepare data for Pinecone
for i, chunk in enumerate(chunks):
vectors.append({
"id": str(uuid.uuid4()), # unique ID
"values": embeddings[i], # vector
"metadata": {
"text": chunk, # original text
"source": os.path.basename(file_path)
}
})
# Upload to Pinecone
index.upsert(vectors)
print(f"Inserted {len(vectors)} chunks")
# ---------------------------
# PROCESS ALL FILES
# ---------------------------
def ingest_all():
# Loop through every file in /data
for filename in os.listdir(DATA_FOLDER):
process_file(os.path.join(DATA_FOLDER, filename))
# ---------------------------
# RUN SCRIPT
# ---------------------------
if __name__ == "__main__":
ingest_all()Step 6 – Create Code to Connect LLM, Database, and User Input Using LangGraph
File: lgdemo.py
👉 This command:
- Prompts you to ask a question
- Converts your question into an embedding
- Searches Pinecone for relevant data
- Sends that data to the AI
- Returns a final answer
Think of this as:
“Ask questions using my data”
<?php
# ============================================
# MAIN RAG PIPELINE (ASK QUESTIONS)
# ============================================
import os
import logging
from typing import TypedDict, List
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone, ServerlessSpec
from langgraph.graph import StateGraph, END
# Load API keys
load_dotenv()
# Enable logging
logging.basicConfig(level=logging.INFO)
# ---------------------------
# DEFINE DATA STRUCTURE
# ---------------------------
class GraphState(TypedDict):
query: str # user question
embedding: List[float] # vector form of question
results: list # search results
answer: str # final AI answer
# ---------------------------
# EMBEDDING SERVICE
# ---------------------------
class EmbeddingService:
def __init__(self, client):
self.client = client
def embed(self, text):
# Convert text into vector
res = self.client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return res.data[0].embedding
# ---------------------------
# PINECONE SERVICE
# ---------------------------
class PineconeService:
def __init__(self):
self.pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
self.index_name = os.getenv("PINECONE_INDEX_NAME")
# Check if index exists
existing = [i.name for i in self.pc.list_indexes()]
if self.index_name not in existing:
logging.info("Creating Pinecone index...")
self.pc.create_index(
name=self.index_name,
dimension=1536,
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region=os.getenv("PINECONE_ENV")
)
)
# Connect to index
self.index = self.pc.Index(self.index_name)
def query(self, vector):
# Search Pinecone using vector
return self.index.query(
vector=vector,
top_k=3,
include_metadata=True
).matches
# ---------------------------
# LLM SERVICE
# ---------------------------
class LLMService:
def __init__(self):
self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def generate(self, query, context):
# Build prompt with context
prompt = f"""
Use the context below to answer the question.
Context:
{context}
Question:
{query}
"""
# Ask AI for answer
res = self.client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return res.choices[0].message.content
# ---------------------------
# LANGGRAPH PIPELINE
# ---------------------------
class LangGraphService:
def __init__(self):
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Initialize services
self.embedder = EmbeddingService(client)
self.vector = PineconeService()
self.llm = LLMService()
# Create graph
self.graph = StateGraph(GraphState)
self._build()
# Compile graph
self.app = self.graph.compile()
# Step 1: get question from user
def get_query(self, state):
return {"query": input("Enter your question: ")}
# Step 2: convert question to vector
def embed_query(self, state):
return {"embedding": self.embedder.embed(state["query"])}
# Step 3: search Pinecone
def vector_search(self, state):
return {"results": self.vector.query(state["embedding"])}
# Step 4: generate answer
def generate_answer(self, state):
# Combine retrieved text
context = "\n".join([r.metadata["text"] for r in state["results"]])
# Ask AI
answer = self.llm.generate(state["query"], context)
return {"answer": answer}
# Build pipeline flow
def _build(self):
self.graph.add_node("query", self.get_query)
self.graph.add_node("embed", self.embed_query)
self.graph.add_node("search", self.vector_search)
self.graph.add_node("answer", self.generate_answer)
# Define flow order
self.graph.set_entry_point("query")
self.graph.add_edge("query", "embed")
self.graph.add_edge("embed", "search")
self.graph.add_edge("search", "answer")
self.graph.add_edge("answer", END)
# Run the app
def run(self):
result = self.app.invoke({})
print("\nAnswer:\n")
print(result["answer"])
# ---------------------------
# START PROGRAM
# ---------------------------
if __name__ == "__main__":
service = LangGraphService()
service.run()
Step 7 – Run Your RAG System with Pinecone and LangGraph
python lgdemo.py

