Build a RAG System with Pinecone and LangGraph

Introduction

If you want your AI app to answer questions using your own data, building a **RAG system with Pinecone and LangGraph** is the ideal approach. This guide walks you through every step, from setting up your environment to indexing documents and querying your AI, ensuring accurate, context-driven answers.

In this guide, you’ll learn how to build a production-ready RAG pipeline using:

  • LangGraph for orchestration
  • OpenAI for embeddings + LLM
  • Pinecone for vector search

By the end, you’ll have a working system just like the one in your code.

What is a RAG System with Pinecone and LangGraph?

A Retrieval-Augmented Generation (RAG) system works like this:

  1. User asks a question
  2. Convert question → embedding
  3. Search vector database
  4. Retrieve relevant chunks
  5. Send context + question to LLM
  6. Generate an accurate answer

👉 This prevents hallucinations and makes your AI grounded in real data.

RAG Architecture with Pinecone and LangGraph

The pipeline in a RAG system looks like this:

  • Query Input
  • Embedding (OpenAI, Claude AI, Gemini, etc.)
  • Vector Search (Pinecone)
  • Context Assembly
  • LLM Generation

LangGraph connects all of this into a clean execution flow.

Step 1 – Create Project in VS Code + Virtual Environment

Open VS Code and create a new folder:

mkdir rag-project
cd rag-project
code .
Directory structure:

rag-project/
│── lgdemo.py
│── ingest_docs.py
│── .env
│── data/
│── .gitignore

Create a virtual environment

python -m venv venv

Activate it

Windows (PowerShell):

venv\Scripts\activate

Mac/Linux:

source venv/bin/activate

Select interpreter in VS Code

  • Press Ctrl + Shift + P
  • Type: Python: Select Interpreter
  • Choose your venv

Step 2 – Initialize Git + Create GitHub Repository

Initialize Git locally

git init

Create .gitignore

venv/
.env
__pycache__/

First commit

git add .
git commit -m “Initial RAG project setup”

Create GitHub Repository

Go to GitHub and:

  1. Click New Repository
  2. Name it: rag-project
  3. Do NOT initialize with README (you already have files)

Connect local repo to GitHub

git remote add origin https://github.com/YOUR_USERNAME/rag-project.git
git branch -M main
git push -u origin main

Step 3 – Install Dependencies for RAG System with Pinecone and LangGraph

pip install openai pinecone-client python-dotenv langgraph pypdf

Step 4 – Set Up Environment Variables

Create .env:

APIs can be obtained by going to OpenAI and Pinecone, opening an account, and creating the APIs.
OPENAI_API_KEY=your_key
PINECONE_API_KEY=your_key
PINECONE_INDEX_NAME=rag-index
PINECONE_ENV=us-east-1

Step 5 – Add PDFs to Pinecone Recursively

File: ingest_docs.py

This file:

  • Reads your files from the data/ folder
  • Breaks them into smaller chunks
  • Converts each chunk into embeddings (vectors)
  • Stores them in Pinecone

Think of this as:
“Load my knowledge into the AI’s memory.”

<?php

# ============================================
# INGEST SCRIPT (LOAD DATA INTO PINECONE)
# ============================================

import os
import uuid
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone
from pypdf import PdfReader

# Load API keys from .env file
load_dotenv()

# Create OpenAI client (used for embeddings)
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Connect to Pinecone
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
index = pc.Index(os.getenv("PINECONE_INDEX_NAME"))

# Folder where your documents live
DATA_FOLDER = "../data"

# ---------------------------
# READ TEXT FILE
# ---------------------------
def load_txt(path):
    # Open and read text file
    with open(path, "r", encoding="utf-8") as f:
        return f.read()

# ---------------------------
# READ PDF FILE
# ---------------------------
def load_pdf(path):
    reader = PdfReader(path)
    text = ""

    # Loop through each page
    for page in reader.pages:
        page_text = page.extract_text()
        if page_text:
            text += page_text + "\n"

    return text

# ---------------------------
# SPLIT TEXT INTO CHUNKS
# ---------------------------
def chunk_text(text, chunk_size=500, overlap=100):
    chunks = []
    start = 0

    # Break text into overlapping chunks
    while start < len(text):
        chunk = text[start:start + chunk_size].strip()
        if chunk:
            chunks.append(chunk)

        # Move forward but keep some overlap
        start += chunk_size - overlap

    return chunks

# ---------------------------
# CREATE EMBEDDINGS (BATCH)
# ---------------------------
def embed_batch(texts):
    # Convert text into vectors
    res = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )

    # Return list of embeddings
    return [d.embedding for d in res.data]

# ---------------------------
# PROCESS ONE FILE
# ---------------------------
def process_file(file_path):
    print(f"\nProcessing: {file_path}")

    # Decide how to load file
    if file_path.endswith(".txt"):
        text = load_txt(file_path)
    elif file_path.endswith(".pdf"):
        text = load_pdf(file_path)
    else:
        print("Skipping unsupported file")
        return

    # Skip empty files
    if not text.strip():
        print("Empty file")
        return

    # Break text into chunks
    chunks = chunk_text(text)
    print(f"{len(chunks)} chunks created")

    # Convert chunks into embeddings
    embeddings = embed_batch(chunks)

    vectors = []

    # Prepare data for Pinecone
    for i, chunk in enumerate(chunks):
        vectors.append({
            "id": str(uuid.uuid4()),  # unique ID
            "values": embeddings[i],  # vector
            "metadata": {
                "text": chunk,        # original text
                "source": os.path.basename(file_path)
            }
        })

    # Upload to Pinecone
    index.upsert(vectors)

    print(f"Inserted {len(vectors)} chunks")

# ---------------------------
# PROCESS ALL FILES
# ---------------------------
def ingest_all():
    # Loop through every file in /data
    for filename in os.listdir(DATA_FOLDER):
        process_file(os.path.join(DATA_FOLDER, filename))

# ---------------------------
# RUN SCRIPT
# ---------------------------
if __name__ == "__main__":
    ingest_all()

Step 6 – Create Code to Connect LLM, Database, and User Input Using LangGraph

File: lgdemo.py

👉 This command:

  • Prompts you to ask a question
  • Converts your question into an embedding
  • Searches Pinecone for relevant data
  • Sends that data to the AI
  • Returns a final answer

Think of this as:
“Ask questions using my data”

<?php

# ============================================
# MAIN RAG PIPELINE (ASK QUESTIONS)
# ============================================

import os
import logging
from typing import TypedDict, List
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone, ServerlessSpec
from langgraph.graph import StateGraph, END

# Load API keys
load_dotenv()

# Enable logging
logging.basicConfig(level=logging.INFO)

# ---------------------------
# DEFINE DATA STRUCTURE
# ---------------------------
class GraphState(TypedDict):
    query: str        # user question
    embedding: List[float]  # vector form of question
    results: list     # search results
    answer: str       # final AI answer

# ---------------------------
# EMBEDDING SERVICE
# ---------------------------
class EmbeddingService:
    def __init__(self, client):
        self.client = client

    def embed(self, text):
        # Convert text into vector
        res = self.client.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return res.data[0].embedding

# ---------------------------
# PINECONE SERVICE
# ---------------------------
class PineconeService:
    def __init__(self):
        self.pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
        self.index_name = os.getenv("PINECONE_INDEX_NAME")

        # Check if index exists
        existing = [i.name for i in self.pc.list_indexes()]

        if self.index_name not in existing:
            logging.info("Creating Pinecone index...")

            self.pc.create_index(
                name=self.index_name,
                dimension=1536,
                metric="cosine",
                spec=ServerlessSpec(
                    cloud="aws",
                    region=os.getenv("PINECONE_ENV")
                )
            )

        # Connect to index
        self.index = self.pc.Index(self.index_name)

    def query(self, vector):
        # Search Pinecone using vector
        return self.index.query(
            vector=vector,
            top_k=3,
            include_metadata=True
        ).matches

# ---------------------------
# LLM SERVICE
# ---------------------------
class LLMService:
    def __init__(self):
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

    def generate(self, query, context):
        # Build prompt with context
        prompt = f"""
Use the context below to answer the question.

Context:
{context}

Question:
{query}
"""

        # Ask AI for answer
        res = self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )

        return res.choices[0].message.content

# ---------------------------
# LANGGRAPH PIPELINE
# ---------------------------
class LangGraphService:
    def __init__(self):
        client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

        # Initialize services
        self.embedder = EmbeddingService(client)
        self.vector = PineconeService()
        self.llm = LLMService()

        # Create graph
        self.graph = StateGraph(GraphState)
        self._build()

        # Compile graph
        self.app = self.graph.compile()

    # Step 1: get question from user
    def get_query(self, state):
        return {"query": input("Enter your question: ")}

    # Step 2: convert question to vector
    def embed_query(self, state):
        return {"embedding": self.embedder.embed(state["query"])}

    # Step 3: search Pinecone
    def vector_search(self, state):
        return {"results": self.vector.query(state["embedding"])}

    # Step 4: generate answer
    def generate_answer(self, state):
        # Combine retrieved text
        context = "\n".join([r.metadata["text"] for r in state["results"]])

        # Ask AI
        answer = self.llm.generate(state["query"], context)

        return {"answer": answer}

    # Build pipeline flow
    def _build(self):
        self.graph.add_node("query", self.get_query)
        self.graph.add_node("embed", self.embed_query)
        self.graph.add_node("search", self.vector_search)
        self.graph.add_node("answer", self.generate_answer)

        # Define flow order
        self.graph.set_entry_point("query")
        self.graph.add_edge("query", "embed")
        self.graph.add_edge("embed", "search")
        self.graph.add_edge("search", "answer")
        self.graph.add_edge("answer", END)

    # Run the app
    def run(self):
        result = self.app.invoke({})

        print("\nAnswer:\n")
        print(result["answer"])

# ---------------------------
# START PROGRAM
# ---------------------------
if __name__ == "__main__":
    service = LangGraphService()
    service.run()


Step 7 – Run Your RAG System with Pinecone and LangGraph

python ingest_docs.py
python lgdemo.py
Confirm that the question you type into the prompt is relevant to the PDF document you uploaded, so it returns the answer from the PDF and not the internet.