Why Build LLM API with FastAPI and OpenAI?
If you want to Build LLM API with FastAPI and OpenAI, this guide will walk you through it step by step. Instead of another basic demo, we’ll create a production-ready API that clients can actually use — complete with documentation, error handling, JSON requests, and proper structure.
By the end, you’ll have a professional backend AI service you can showcase in your portfolio or offer to consulting clients.
Project Structure for a Production LLM API
In this guide, we’re going to build a professional LLM API with FastAPI and OpenAI, step by step. By the end, you’ll have:
-
An async FastAPI endpoint capable of handling multiple requests
-
JSON and query-based inputs so your API is flexible
-
Caching and health checks to save time, cost, and headaches
-
Ready-to-use Python and JavaScript client examples
-
A smooth VSCode workflow with standalone testing tools
Let’s get started!
Step 1: Install Dependencies to Build LLM API with FastAPI and OpenAI
Create a Python virtual environment and install your dependencies. This ensures your project is isolated and professional.
<?php
# Create and activate venv
python -m venv venv
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Install dependencies
pip install fastapi uvicorn openai python-dotenv
Step 2: Create the LLM Client (OpenAI Integration) (llm.py)
This class wraps the OpenAI API so your endpoints stay clean and reusable.
<?php
import os
import logging
from openai import OpenAI, OpenAIError
from dotenv import load_dotenv
load_dotenv()
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
def get_api_key() -> str:
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
logger.error("OPENAI_API_KEY not set")
raise ValueError("OPENAI_API_KEY environment variable not set")
return api_key
class LLMClient:
def __init__(self, api_key: str, model: str = "gpt-4o-mini"):
self.model = model
try:
self.client = OpenAI(api_key=api_key)
except OpenAIError as e:
logger.exception("Failed to initialize OpenAI client")
raise e
def chat(self, prompt: str) -> str:
try:
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message['content']
except Exception as e:
logger.exception("Error generating response")
return f"Error generating response: {e}"
Step 3: Build LLM API with FastAPI and OpenAI — The Main App (main.py)
Here we combine async endpoints, caching, health checks, and flexible input handling.
<?php
from fastapi import FastAPI, HTTPException, Query, Request
from pydantic import BaseModel
import logging
from app.llm import LLMClient, get_api_key
from openai import OpenAIError
import asyncio
from functools import lru_cache
app = FastAPI(title="Pro LLM API", version="1.0")
logger = logging.getLogger("api")
logger.setLevel(logging.INFO)
# Models
class ChatRequest(BaseModel):
prompt: str
class ChatResponse(BaseModel):
response: str
# Initialize LLM client
try:
api_key = get_api_key()
llm_client = LLMClient(api_key)
except Exception as e:
logger.exception("Failed to initialize LLM client")
llm_client = None
# Optional caching
@lru_cache(maxsize=128)
def cached_chat(prompt: str) -> str:
return llm_client.chat(prompt)
# Health check
@app.get("/health")
async def health_check():
status = "ok" if llm_client else "LLM client unavailable"
return {"status": status}
# Unified chat endpoint (GET + POST)
@app.api_route("/chat", methods=["GET", "POST"], response_model=ChatResponse)
async def chat_endpoint(request: Request, prompt: str = Query(None)):
if not llm_client:
raise HTTPException(status_code=500, detail="LLM client not available")
# Determine prompt
if request.method == "POST":
data = await request.json()
prompt_val = data.get("prompt")
if not prompt_val:
raise HTTPException(status_code=422, detail="POST request must include 'prompt'")
else: # GET
if not prompt:
raise HTTPException(status_code=422, detail="GET request must include 'prompt'")
prompt_val = prompt
# Run in executor for async
loop = asyncio.get_event_loop()
try:
llm_response = await loop.run_in_executor(None, cached_chat, prompt_val)
return ChatResponse(response=llm_response)
except OpenAIError as e:
logger.error(f"OpenAI API error: {e}")
raise HTTPException(status_code=502, detail="OpenAI API error")
except Exception as e:
logger.exception(f"Unexpected error: {e}")
raise HTTPException(status_code=500, detail="Internal server error")
Step 4: How Clients Use Your LLM API
Python
JavaScript
Step 5: Testing the FastAPI LLM API in VSCode
Recommended VSCode Extensions:
-
Python – for running FastAPI and IntelliSense
-
REST Client – send
.httprequests directly from VSCode -
Thunder Client – lightweight Postman alternative
-
Pylance – type checking and autocompletion
Example .http File for Quick Testing:
Step 6: Deploying Your LLM API to Production
-
Use caching for repeated prompts to reduce API costs.
-
Combine GET/POST endpoints for flexible client usage.
-
Always include health checks in production APIs.
-
Use FastAPI docs (
/docs) for interactive client testing. -
Share
.httpor Postman collections for plug-and-play API testing.
Step 7: Back Up Your Build LLM API with FastAPI and OpenAI to GitHub Using VSCode
Once you build LLM API with FastAPI and OpenAI, the next professional move is backing it up to GitHub. This protects your work, creates version history, and makes it easier to share your API with clients or collaborators.
1️⃣ Create .gitignore FIRST
Create a .gitignore file in your project root:
copy and paste the files to the gitignore file>
Never upload your .env file or virtual environment. Your OpenAI API key should always stay private.
2️⃣ Initialize Git
3️⃣ Add and Commit
4️⃣ Create GitHub Repository
Create repo on GitHub (leave it empty).
5️⃣ Connect and Push
Now your FastAPI and OpenAI LLM API project is safely backed up and ready to showcase in your portfolio or share with clients.
Official Documentation to Build LLM API with FastAPI and OpenAI
When you build LLM API with FastAPI and OpenAI, relying on official documentation ensures your implementation follows best practices, stays secure, and remains production-ready. Be sure to reference the
FastAPI documentation
for framework guidance, the
OpenAI API documentation
for model integration, the
Pydantic documentation
for request validation, and the
Uvicorn ASGI server documentation
for running your application in development and production environments.
If you’re new to backend development, read my guide on Building REST APIs with Python.
✅ Conclusion
With this setup, you have a fully professional LLM API:
-
Async, cached, and ready for multiple clients
-
Flexible input methods (JSON POST + GET query)
-
Health check endpoint for monitoring
-
Interactive documentation and client-ready examples
You’re ready to ship your own AI API or let clients plug in immediately!

