Why Build LLM API with FastAPI and OpenAI?

If you want to Build LLM API with FastAPI and OpenAI, this guide will walk you through it step by step. Instead of another basic demo, we’ll create a production-ready API that clients can actually use — complete with documentation, error handling, JSON requests, and proper structure.

By the end, you’ll have a professional backend AI service you can showcase in your portfolio or offer to consulting clients.

Project Structure for a Production LLM API

In this guide, we’re going to build a professional LLM API with FastAPI and OpenAI, step by step. By the end, you’ll have:

  • An async FastAPI endpoint capable of handling multiple requests

  • JSON and query-based inputs so your API is flexible

  • Caching and health checks to save time, cost, and headaches

  • Ready-to-use Python and JavaScript client examples

  • A smooth VSCode workflow with standalone testing tools

Let’s get started!

 


Step 1: Install Dependencies to Build LLM API with FastAPI and OpenAI

Create a Python virtual environment and install your dependencies. This ensures your project is isolated and professional.

<?php

# Create and activate venv
python -m venv venv
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate  # Windows

# Install dependencies
pip install fastapi uvicorn openai python-dotenv

Step 2: Create the LLM Client (OpenAI Integration) (llm.py)

This class wraps the OpenAI API so your endpoints stay clean and reusable.

<?php

import os
import logging
from openai import OpenAI, OpenAIError
from dotenv import load_dotenv

load_dotenv()
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

def get_api_key() -> str:
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        logger.error("OPENAI_API_KEY not set")
        raise ValueError("OPENAI_API_KEY environment variable not set")
    return api_key

class LLMClient:
    def __init__(self, api_key: str, model: str = "gpt-4o-mini"):
        self.model = model
        try:
            self.client = OpenAI(api_key=api_key)
        except OpenAIError as e:
            logger.exception("Failed to initialize OpenAI client")
            raise e

    def chat(self, prompt: str) -> str:
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}]
            )
            return response.choices[0].message['content']
        except Exception as e:
            logger.exception("Error generating response")
            return f"Error generating response: {e}"

Step 3: Build LLM API with FastAPI and OpenAI — The Main App (main.py)

Here we combine async endpoints, caching, health checks, and flexible input handling.

<?php

from fastapi import FastAPI, HTTPException, Query, Request
from pydantic import BaseModel
import logging
from app.llm import LLMClient, get_api_key
from openai import OpenAIError
import asyncio
from functools import lru_cache

app = FastAPI(title="Pro LLM API", version="1.0")
logger = logging.getLogger("api")
logger.setLevel(logging.INFO)

# Models
class ChatRequest(BaseModel):
    prompt: str

class ChatResponse(BaseModel):
    response: str

# Initialize LLM client
try:
    api_key = get_api_key()
    llm_client = LLMClient(api_key)
except Exception as e:
    logger.exception("Failed to initialize LLM client")
    llm_client = None

# Optional caching
@lru_cache(maxsize=128)
def cached_chat(prompt: str) -> str:
    return llm_client.chat(prompt)

# Health check
@app.get("/health")
async def health_check():
    status = "ok" if llm_client else "LLM client unavailable"
    return {"status": status}

# Unified chat endpoint (GET + POST)
@app.api_route("/chat", methods=["GET", "POST"], response_model=ChatResponse)
async def chat_endpoint(request: Request, prompt: str = Query(None)):
    if not llm_client:
        raise HTTPException(status_code=500, detail="LLM client not available")

    # Determine prompt
    if request.method == "POST":
        data = await request.json()
        prompt_val = data.get("prompt")
        if not prompt_val:
            raise HTTPException(status_code=422, detail="POST request must include 'prompt'")
    else:  # GET
        if not prompt:
            raise HTTPException(status_code=422, detail="GET request must include 'prompt'")
        prompt_val = prompt

    # Run in executor for async
    loop = asyncio.get_event_loop()
    try:
        llm_response = await loop.run_in_executor(None, cached_chat, prompt_val)
        return ChatResponse(response=llm_response)
    except OpenAIError as e:
        logger.error(f"OpenAI API error: {e}")
        raise HTTPException(status_code=502, detail="OpenAI API error")
    except Exception as e:
        logger.exception(f"Unexpected error: {e}")
        raise HTTPException(status_code=500, detail="Internal server error")

Step 4: How Clients Use Your LLM API

Python

<?php

import requests

url = "http://127.0.0.1:8000/chat"
resp = requests.post(url, json={"prompt": "Tell me a joke"})
print(resp.json()["response"])

JavaScript

fetch("http://127.0.0.1:8000/chat", {
method: "POST",
headers: {"Content-Type": "application/json"},
body: JSON.stringify({ prompt: "Give me a motivational quote" })
})
.then(res => res.json())
.then(data => console.log(data.response));

Step 5: Testing the FastAPI LLM API in VSCode

Recommended VSCode Extensions:

  • Python – for running FastAPI and IntelliSense

  • REST Client – send .http requests directly from VSCode

  • Thunder Client – lightweight Postman alternative

  • Pylance – type checking and autocompletion

Example .http File for Quick Testing:

GET http://127.0.0.1:8000/chat?prompt=Hello

###

POST http://127.0.0.1:8000/chat
Content-Type: application/json

{
“prompt”: “Tell me a joke”
}


Step 6: Deploying Your LLM API to Production

  • Use caching for repeated prompts to reduce API costs.

  • Combine GET/POST endpoints for flexible client usage.

  • Always include health checks in production APIs.

  • Use FastAPI docs (/docs) for interactive client testing.

  • Share .http or Postman collections for plug-and-play API testing.

 

Step 7: Back Up Your Build LLM API with FastAPI and OpenAI to GitHub Using VSCode

Once you build LLM API with FastAPI and OpenAI, the next professional move is backing it up to GitHub. This protects your work, creates version history, and makes it easier to share your API with clients or collaborators.

1️⃣ Create .gitignore FIRST

Create a .gitignore file in your project root:

copy and paste the files to the gitignore file

venv/
.env
__pycache__/
*.pyc

Never upload your .env file or virtual environment. Your OpenAI API key should always stay private.

 


2️⃣ Initialize Git

git init

3️⃣ Add and Commit

git add .
git commit -m "Initial commit - FastAPI LLM API project"

4️⃣ Create GitHub Repository

Create repo on GitHub (leave it empty).


5️⃣ Connect and Push

git remote add origin https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git
git branch -M main
git push -u origin main

Now your FastAPI and OpenAI LLM API project is safely backed up and ready to showcase in your portfolio or share with clients.

 


Official Documentation to Build LLM API with FastAPI and OpenAI

When you build LLM API with FastAPI and OpenAI, relying on official documentation ensures your implementation follows best practices, stays secure, and remains production-ready. Be sure to reference the
FastAPI documentation
for framework guidance, the
OpenAI API documentation
for model integration, the
Pydantic documentation
for request validation, and the
Uvicorn ASGI server documentation
for running your application in development and production environments.

If you’re new to backend development, read my guide on Building REST APIs with Python.


✅ Conclusion

With this setup, you have a fully professional LLM API:

  • Async, cached, and ready for multiple clients

  • Flexible input methods (JSON POST + GET query)

  • Health check endpoint for monitoring

  • Interactive documentation and client-ready examples

You’re ready to ship your own AI API or let clients plug in immediately!