FastAPI for MLOps

Build production-ready ML APIs with FastAPI

Overview

This module teaches you to build high-performance, production-grade ML APIs using FastAPI. Designed for developers with advanced Python knowledge, this course focuses on real-world patterns for serving machine learning models at scale.

Learning Objectives

After completing this module, you will:

Build type-safe REST APIs with automatic documentation
Handle complex request/response patterns for ML workloads
Design scalable API architectures with proper database integration
Serve ML models with streaming and async inference
Write comprehensive API tests
Deploy FastAPI applications to production

Prerequisites

You should be comfortable with:

Python typing and type hints
Pydantic models and validation
Async/await and asyncio
Decorators and dependency injection concepts
Basic HTTP and REST API concepts

Required from Module 02:

Python Typing
Pydantic
Async/Await

Module Structure

1. FastAPI Fundamentals

Document: FastAPI Fundamentals

Learn the core concepts:

Why FastAPI is ideal for MLOps
Request/response models with Pydantic
Path operations and routing
Dependency injection system
Request validation and error handling
Automatic API documentation

Key Skills: Building basic CRUD endpoints, handling validation errors, using dependency injection.

2. Advanced Request Handling

Document: Advanced Request Handling

Master complex request patterns:

Query parameters and path parameters
Request body validation
File uploads for ML models and datasets
Form data and multipart requests
Headers, cookies, and custom responses
Background tasks for async processing

Key Skills: Handling file uploads, processing background tasks, managing complex request types.

3. API Architecture & Database Integration

Document: API Architecture & Database Integration

Build scalable API architectures:

Project structure for ML APIs
Router organization and modularity
Middleware (CORS, authentication, logging)
Configuration management
SQLAlchemy 2.0 async ORM
Database models and relationships
Alembic migrations
Connection pooling and session management

Key Skills: Structuring large APIs, database integration, managing migrations, implementing middleware.

4. ML Model Serving & Streaming

Document: ML Model Serving & Streaming

Serve ML models effectively:

Loading and caching models at startup
Building prediction endpoints
Batch prediction APIs
Model versioning strategies
Async inference patterns
Server-Sent Events (SSE) for streaming
WebSocket for real-time predictions
Streaming LLM token-by-token responses

Key Skills: Model lifecycle management, streaming responses, real-time predictions.

5. Testing FastAPI

Document: Testing FastAPI

Write comprehensive tests:

TestClient for endpoint testing
Testing async endpoints
Dependency overrides and mocking
Database testing with fixtures
Integration test patterns
Load testing and performance benchmarks

Key Skills: Writing maintainable tests, mocking dependencies, performance testing.

6. Production Deployment

Document: Production Deployment

Deploy to production:

Uvicorn and Gunicorn configuration
Docker containerization
Health checks and readiness probes
Structured logging and observability
Performance optimization
Security best practices

Key Skills: Production deployment, monitoring, optimization, security hardening.

Common Patterns

Pattern 1: Dependency Injection for Models

python

from fastapi import Depends
from typing import Annotated

async def get_model() -> MLModel:
    """Load model as dependency."""
    if not hasattr(get_model, "model"):
        get_model.model = await load_model()
    return get_model.model

@app.post("/predict")
async def predict(
    data: PredictionInput,
    model: Annotated[MLModel, Depends(get_model)]
) -> PredictionOutput:
    result = await model.predict(data)
    return PredictionOutput(**result)

Pattern 2: Database Session Management

python

from sqlalchemy.ext.asyncio import AsyncSession

async def get_db() -> AsyncSession:
    """Provide database session."""
    async with SessionLocal() as session:
        yield session

@app.post("/predictions")
async def save_prediction(
    data: PredictionInput,
    db: Annotated[AsyncSession, Depends(get_db)]
):
    prediction = Prediction(**data.dict())
    db.add(prediction)
    await db.commit()

Pattern 3: Streaming Responses

python

from fastapi.responses import StreamingResponse

async def generate_tokens(text: str):
    """Stream LLM tokens."""
    async for token in model.generate_stream(text):
        yield f"data: {token}\n\n"

@app.post("/generate")
async def stream_generation(prompt: str):
    return StreamingResponse(
        generate_tokens(prompt),
        media_type="text/event-stream"
    )

Tools and Libraries

Install the FastAPI ecosystem:

bash

# Core FastAPI
uv add fastapi uvicorn[standard]

# Database
uv add sqlalchemy[asyncio] alembic asyncpg

# Validation and utilities
uv add pydantic pydantic-settings python-multipart

# Development tools
uv add --dev pytest pytest-asyncio httpx

# Optional: Authentication
uv add python-jose[cryptography] passlib[bcrypt]

# Optional: Monitoring
uv add prometheus-fastapi-instrumentator

Quick Start

Create your first FastAPI app:

python

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI(title="ML API")

class PredictionInput(BaseModel):
    features: list[float]

class PredictionOutput(BaseModel):
    prediction: float
    confidence: float

@app.post("/predict", response_model=PredictionOutput)
async def predict(data: PredictionInput) -> PredictionOutput:
    # Your ML logic here
    prediction = sum(data.features) / len(data.features)
    return PredictionOutput(prediction=prediction, confidence=0.95)

Run with:

bash

uvicorn main:app --reload

Visit http://localhost:8000/docs for interactive API documentation.

Development Workflow

Setup project:

bash

uv init ml-api
cd ml-api
uv add fastapi uvicorn[standard]

Create app structure:

ml-api/
├── src/
│   ├── main.py
│   ├── models.py
│   ├── routes/
│   └── dependencies.py
├── tests/
├── alembic/
└── pyproject.toml

Run development server:
bash
```
uvicorn src.main:app --reload
```
1
Run tests:
bash
```
pytest tests/
```
1

Generate migrations:

bash

alembic revision --autogenerate -m "Add predictions table"
alembic upgrade head

Troubleshooting

Import Errors

If modules not found:

bash

# Ensure you're in the project directory
cd ml-api

# Sync dependencies
uv sync

# Run with uv
uv run uvicorn src.main:app --reload

Async Database Issues

Common async/await mistakes:

python

# Wrong: Using sync methods
session.commit()

# Correct: Use async methods
await session.commit()

CORS Errors

Enable CORS for frontend access:

python

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # Configure for production
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Performance Tips

Use async everywhere - FastAPI shines with async/await
Enable response compression - For large JSON responses
Cache model predictions - Redis or in-memory caching
Connection pooling - Configure SQLAlchemy pool size
Background tasks - For non-blocking operations

Security Checklist

[ ] Input validation with Pydantic
[ ] Authentication and authorization
[ ] HTTPS in production
[ ] CORS configured properly
[ ] Rate limiting enabled
[ ] SQL injection prevention (use ORM)
[ ] Environment variables for secrets
[ ] Request size limits

Next Steps

After completing this module:

Build a project - Create your own ML API
Module 03 - CI/CD and deployment automation
Advanced topics - GraphQL, gRPC, service mesh
Scale up - Kubernetes deployment, load balancing

Additional Resources

Community and Support

Questions: Open an issue in the repository
Discussions: Join the MLOps training community
Contributions: Submit improvements via pull requests

Ready to start? Begin with FastAPI Fundamentals to build your first API.

FastAPI for MLOps ​

Overview ​

Learning Objectives ​

Prerequisites ​

Module Structure ​

1. FastAPI Fundamentals ​

2. Advanced Request Handling ​

3. API Architecture & Database Integration ​

4. ML Model Serving & Streaming ​

5. Testing FastAPI ​

6. Production Deployment ​

Common Patterns ​

Pattern 1: Dependency Injection for Models ​

Pattern 2: Database Session Management ​

Pattern 3: Streaming Responses ​

Tools and Libraries ​

Quick Start ​

Development Workflow ​

Troubleshooting ​

Import Errors ​

Async Database Issues ​

CORS Errors ​

Performance Tips ​

Security Checklist ​

Next Steps ​

Additional Resources ​

Community and Support ​

FastAPI for MLOps

Overview

Learning Objectives

Prerequisites

Module Structure

1. FastAPI Fundamentals

2. Advanced Request Handling

3. API Architecture & Database Integration

4. ML Model Serving & Streaming

5. Testing FastAPI

6. Production Deployment

Common Patterns

Pattern 1: Dependency Injection for Models

Pattern 2: Database Session Management

Pattern 3: Streaming Responses

Tools and Libraries

Quick Start

Development Workflow

Troubleshooting

Import Errors

Async Database Issues

CORS Errors

Performance Tips

Security Checklist

Next Steps

Additional Resources

Community and Support