Lessons

Practical lessons from building a labor market data warehouse. Each lesson covers a concept encountered during development, with concrete examples from this codebase.

Lesson 1

The Federal Labor Data Landscape

Four federal data products, one occupation code. How SOC, OEWS, O*NET, and Employment Projections connect through the Standard Occupational Classification.

Data sources & relationships
Lesson 2

Dimensional Modeling for Labor Data

Facts, dimensions, bridges, and the four-layer architecture. Why separating measurements from context makes everything easier to query.

Architecture & data modeling
Lesson 3

The Multi-Vintage Challenge

What breaks when you load multiple years of the same dataset, and how shared dimensions make cross-vintage time-series analysis possible.

Multi-year data integration
Lesson 4

Data Quality Traps in Government Sources

ZIP files with garbage inside, column names that change between releases, overlapping occupation groups, and suppressed values that aren't zeros.

Data quality & defensive parsing
Lesson 5

Time-Series Normalization

Transforming point-in-time snapshots into proper time-series data. As-published vs. comparable history, and how derived metrics are computed.

Time-series design & analysis
Lesson 6

Idempotent Pipeline Design

Why every load operation must be safely re-runnable. Delete-before-insert, check-before-insert, and grain uniqueness as the final safety net.

Pipeline reliability patterns
Lesson 7

Static Site Generation

Converting a dynamic FastAPI application into static files for GitHub Pages. Fetch shims, path rewriting, and the limits of pre-rendering.

Deployment & static generation
Lesson 8

Testing and Deployment

Three test directories, schema contracts, grain uniqueness, idempotence verification, CI configuration, and the full deployment checklist.

Testing strategy & CI/CD
Lesson 9

Choosing the Right Similarity Algorithm

Why Jaccard similarity produced 100% matches for every occupation pair, and how cosine similarity on importance scores fixed it.

Algorithm selection & data structure
Lesson 10

Thread-Safe Database Connections

Why landing page stats intermittently showed zero, and how per-thread DuckDB connections fixed the concurrency bug.

Concurrency & web server architecture
Lesson 11

Multi-Vintage Query Pitfalls

Three traps when querying across multiple data releases: duplicate rows, missing derived metrics, and percentages without absolute context.

Querying across multiple data releases
Lesson 12

UI-Data Alignment

Dead-end links, mixed time periods, and static site shim gaps. Three patterns for making the interface reflect what data actually exists.

UI design & data availability
Lesson 13

Schema Drift Detection

Column adds, removes, and retypes between releases. How validation gates block bad data from reaching the warehouse.

Data validation & drift detection
Lesson 14

Inflation Adjustment with CPI

CPI-U deflation formula, base year selection, and the SQL pattern for computing real wages from nominal observations.

Real wages & economic adjustment
Lesson 15

Crosswalk & Taxonomy Evolution

SOC version changes, 1:1/split/merge/complex mappings, and why “same code” doesn’t mean “same occupation.”

Occupation code versioning
Lesson 16

Extract Patterns for Government APIs

Browser header spoofing, retry with exponential backoff, format heterogeneity, and manifest-driven downloads.

Data extraction & HTTP patterns
Lesson 17

Derived Metrics from Base Observations

Why derived metrics need their own fact table. Self-join patterns for YoY, rolling averages, and idempotent computation.

Derived series & computation
Lesson 18

Ranked Movers & Outlier Interpretation

Small-denominator volatility, dual-direction sorting, and why suppressed data causes apparent outliers in top-movers lists.

Statistical interpretation
Lesson 19

Geography Comparison Pitfalls

State-vs-national gaps, cost-of-living confounders, suppressed cells in small states, and dimension-driven filtering.

Geographic data challenges
Lesson 20

Fetch Shim Architecture

Turning a dynamic API into a static site. JavaScript fetch interception, URL-to-file mapping, and client-side search.

Static site & deployment