Build LLM API with FastAPI and OpenAI (Production Guide)Digit Waves

Why Build LLM API with FastAPI and OpenAI?

If you want to Build LLM API with FastAPI and OpenAI, this guide will walk you through it step by step. Instead of another basic demo, we’ll create a production-ready API that clients can actually use — complete with documentation, error handling, JSON requests, and proper structure.

By the end, you’ll have a professional backend AI service you can showcase in your portfolio or offer to consulting clients.

Project Structure for a Production LLM API

In this guide, we’re going to build a professional LLM API with FastAPI and OpenAI, step by step. By the end, you’ll have:

An async FastAPI endpoint capable of handling multiple requests
JSON and query-based inputs so your API is flexible
Caching and health checks to save time, cost, and headaches
Ready-to-use Python and JavaScript client examples
A smooth VSCode workflow with standalone testing tools

Let’s get started!

Step 1: Install Dependencies to Build LLM API with FastAPI and OpenAI

Create a Python virtual environment and install your dependencies. This ensures your project is isolated and professional.

<?php

# Create and activate venv
python -m venv venv
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate  # Windows

# Install dependencies
pip install fastapi uvicorn openai python-dotenv

Step 2: Create the LLM Client (OpenAI Integration) (`llm.py`)

This class wraps the OpenAI API so your endpoints stay clean and reusable.

<?php

import os
import logging
from openai import OpenAI, OpenAIError
from dotenv import load_dotenv

load_dotenv()
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

def get_api_key() -> str:
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        logger.error("OPENAI_API_KEY not set")
        raise ValueError("OPENAI_API_KEY environment variable not set")
    return api_key

class LLMClient:
    def __init__(self, api_key: str, model: str = "gpt-4o-mini"):
        self.model = model
        try:
            self.client = OpenAI(api_key=api_key)
        except OpenAIError as e:
            logger.exception("Failed to initialize OpenAI client")
            raise e

    def chat(self, prompt: str) -> str:
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}]
            )
            return response.choices[0].message['content']
        except Exception as e:
            logger.exception("Error generating response")
            return f"Error generating response: {e}"

Step 3: Build LLM API with FastAPI and OpenAI — The Main App (`main.py`)

Here we combine async endpoints, caching, health checks, and flexible input handling.

<?php

from fastapi import FastAPI, HTTPException, Query, Request
from pydantic import BaseModel
import logging
from app.llm import LLMClient, get_api_key
from openai import OpenAIError
import asyncio
from functools import lru_cache

app = FastAPI(title="Pro LLM API", version="1.0")
logger = logging.getLogger("api")
logger.setLevel(logging.INFO)

# Models
class ChatRequest(BaseModel):
    prompt: str

class ChatResponse(BaseModel):
    response: str

# Initialize LLM client
try:
    api_key = get_api_key()
    llm_client = LLMClient(api_key)
except Exception as e:
    logger.exception("Failed to initialize LLM client")
    llm_client = None

# Optional caching
@lru_cache(maxsize=128)
def cached_chat(prompt: str) -> str:
    return llm_client.chat(prompt)

# Health check
@app.get("/health")
async def health_check():
    status = "ok" if llm_client else "LLM client unavailable"
    return {"status": status}

# Unified chat endpoint (GET + POST)
@app.api_route("/chat", methods=["GET", "POST"], response_model=ChatResponse)
async def chat_endpoint(request: Request, prompt: str = Query(None)):
    if not llm_client:
        raise HTTPException(status_code=500, detail="LLM client not available")

    # Determine prompt
    if request.method == "POST":
        data = await request.json()
        prompt_val = data.get("prompt")
        if not prompt_val:
            raise HTTPException(status_code=422, detail="POST request must include 'prompt'")
    else:  # GET
        if not prompt:
            raise HTTPException(status_code=422, detail="GET request must include 'prompt'")
        prompt_val = prompt

    # Run in executor for async
    loop = asyncio.get_event_loop()
    try:
        llm_response = await loop.run_in_executor(None, cached_chat, prompt_val)
        return ChatResponse(response=llm_response)
    except OpenAIError as e:
        logger.error(f"OpenAI API error: {e}")
        raise HTTPException(status_code=502, detail="OpenAI API error")
    except Exception as e:
        logger.exception(f"Unexpected error: {e}")
        raise HTTPException(status_code=500, detail="Internal server error")

Step 4: How Clients Use Your LLM API

Python

JavaScript

Step 5: Testing the FastAPI LLM API in VSCode

Recommended VSCode Extensions:

Python – for running FastAPI and IntelliSense
REST Client – send .http requests directly from VSCode
Thunder Client – lightweight Postman alternative
Pylance – type checking and autocompletion

Example .http File for Quick Testing:

Step 6: Deploying Your LLM API to Production

Use caching for repeated prompts to reduce API costs.
Combine GET/POST endpoints for flexible client usage.
Always include health checks in production APIs.
Use FastAPI docs (/docs) for interactive client testing.
Share .http or Postman collections for plug-and-play API testing.

Step 7: Back Up Your Build LLM API with FastAPI and OpenAI to GitHub Using VSCode

Once you build LLM API with FastAPI and OpenAI, the next professional move is backing it up to GitHub. This protects your work, creates version history, and makes it easier to share your API with clients or collaborators.

1️⃣ Create `.gitignore` FIRST

Create a .gitignore file in your project root:

copy and paste the files to the gitignore file

Never upload your .env file or virtual environment. Your OpenAI API key should always stay private.

2️⃣ Initialize Git

3️⃣ Add and Commit

4️⃣ Create GitHub Repository

Create repo on GitHub (leave it empty).

5️⃣ Connect and Push

Now your FastAPI and OpenAI LLM API project is safely backed up and ready to showcase in your portfolio or share with clients.

Official Documentation to Build LLM API with FastAPI and OpenAI

When you build LLM API with FastAPI and OpenAI, relying on official documentation ensures your implementation follows best practices, stays secure, and remains production-ready. Be sure to reference the
FastAPI documentation
for framework guidance, the
OpenAI API documentation
for model integration, the
Pydantic documentation
for request validation, and the
Uvicorn ASGI server documentation
for running your application in development and production environments.

If you’re new to backend development, read my guide on Building REST APIs with Python.

✅ Conclusion

With this setup, you have a fully professional LLM API:

Async, cached, and ready for multiple clients
Flexible input methods (JSON POST + GET query)
Health check endpoint for monitoring
Interactive documentation and client-ready examples

You’re ready to ship your own AI API or let clients plug in immediately!

Build a Pro-Grade LLM API with FastAPI, OpenAI, and Python – Step-by-Step Guide