Thread-Safe Database Connections
The Symptom: Intermittent Zeros
The landing page showed occupation counts, task counts, and other statistics fetched
from the /api/stats endpoint. Most of the time it worked perfectly. But
on some page refreshes, one or more stats showed 0 instead of the
correct value. The API returned correct data when tested with curl.
Key clue: The bug was intermittent and only appeared in the browser, never in sequential curl requests. This pointed to a concurrency issue.
How FastAPI Dispatches Sync Endpoints
FastAPI has two kinds of endpoint handlers:
# Async handler — runs on the main event loop thread
async def stats():
...
# Sync handler — runs in a THREAD POOL (default: 40 workers)
def stats():
...
All the API endpoints in this project used sync handlers (def, not
async def). FastAPI dispatches each sync handler to a thread from
anyio's default thread pool. When the landing page fires two fetch
requests simultaneously (/api/stats and /api/occupations/15-1252),
they run on different threads.
DuckDB Connections Are Not Thread-Safe
The database layer shared a single DuckDB connection across the entire process:
# BROKEN: shared connection, no thread safety
_conn = None
def get_db():
global _conn
if _conn is not None:
return _conn
_conn = duckdb.connect("warehouse.duckdb", read_only=True)
return _conn
When two threads used this connection simultaneously, DuckDB's internal state could get corrupted — one thread's query might return empty results, partial results, or the wrong data entirely.
The Fix: Per-Thread Connections
Python's threading.local() gives each thread its own storage. Each
thread gets its own DuckDB connection that no other thread can interfere with:
# FIXED: per-thread connections
_local = threading.local()
_test_conn = None # global override for tests
def get_db():
# Test injection takes priority (visible to all threads)
if _test_conn is not None:
return _test_conn
conn = getattr(_local, "conn", None)
if conn is not None:
return conn
_local.conn = duckdb.connect(resolved_path, read_only=True)
return _local.conn
Test complication: set_db() injects a test fixture
connection from the main test thread. But FastAPI's worker threads wouldn't see
it in threading.local(). The fix: a global _test_conn
that takes priority, visible to all threads.
When This Applies
| Framework | Sync Handler Dispatch | Needs Per-Thread DB? |
|---|---|---|
| FastAPI (sync def) | Thread pool | Yes |
| FastAPI (async def) | Event loop (single thread) | No |
| Flask (threaded=True) | Thread per request | Yes |
| Django (WSGI) | Thread per request | Yes |