Thread-Safe Database Connections

Lesson 10 — Concurrency & web server architecture

The Symptom: Intermittent Zeros

The landing page showed occupation counts, task counts, and other statistics fetched from the /api/stats endpoint. Most of the time it worked perfectly. But on some page refreshes, one or more stats showed 0 instead of the correct value. The API returned correct data when tested with curl.

Key clue: The bug was intermittent and only appeared in the browser, never in sequential curl requests. This pointed to a concurrency issue.

How FastAPI Dispatches Sync Endpoints

FastAPI has two kinds of endpoint handlers:

# Async handler — runs on the main event loop thread
async def stats():
    ...

# Sync handler — runs in a THREAD POOL (default: 40 workers)
def stats():
    ...

All the API endpoints in this project used sync handlers (def, not async def). FastAPI dispatches each sync handler to a thread from anyio's default thread pool. When the landing page fires two fetch requests simultaneously (/api/stats and /api/occupations/15-1252), they run on different threads.

DuckDB Connections Are Not Thread-Safe

The database layer shared a single DuckDB connection across the entire process:

# BROKEN: shared connection, no thread safety
_conn = None

def get_db():
    global _conn
    if _conn is not None:
        return _conn
    _conn = duckdb.connect("warehouse.duckdb", read_only=True)
    return _conn

When two threads used this connection simultaneously, DuckDB's internal state could get corrupted — one thread's query might return empty results, partial results, or the wrong data entirely.

The Fix: Per-Thread Connections

Python's threading.local() gives each thread its own storage. Each thread gets its own DuckDB connection that no other thread can interfere with:

# FIXED: per-thread connections
_local = threading.local()
_test_conn = None  # global override for tests

def get_db():
    # Test injection takes priority (visible to all threads)
    if _test_conn is not None:
        return _test_conn

    conn = getattr(_local, "conn", None)
    if conn is not None:
        return conn

    _local.conn = duckdb.connect(resolved_path, read_only=True)
    return _local.conn

Test complication: set_db() injects a test fixture connection from the main test thread. But FastAPI's worker threads wouldn't see it in threading.local(). The fix: a global _test_conn that takes priority, visible to all threads.

When This Applies

FrameworkSync Handler DispatchNeeds Per-Thread DB?
FastAPI (sync def)Thread poolYes
FastAPI (async def)Event loop (single thread)No
Flask (threaded=True)Thread per requestYes
Django (WSGI)Thread per requestYes