Lessons
Practical lessons from building a labor market data warehouse. Each lesson covers a concept encountered during development, with concrete examples from this codebase.
The Federal Labor Data Landscape
Four federal data products, one occupation code. How SOC, OEWS, O*NET, and Employment Projections connect through the Standard Occupational Classification.
Lesson 2Dimensional Modeling for Labor Data
Facts, dimensions, bridges, and the four-layer architecture. Why separating measurements from context makes everything easier to query.
Lesson 3The Multi-Vintage Challenge
What breaks when you load multiple years of the same dataset, and how shared dimensions make cross-vintage time-series analysis possible.
Lesson 4Data Quality Traps in Government Sources
ZIP files with garbage inside, column names that change between releases, overlapping occupation groups, and suppressed values that aren't zeros.
Lesson 5Time-Series Normalization
Transforming point-in-time snapshots into proper time-series data. As-published vs. comparable history, and how derived metrics are computed.
Lesson 6Idempotent Pipeline Design
Why every load operation must be safely re-runnable. Delete-before-insert, check-before-insert, and grain uniqueness as the final safety net.
Lesson 7Static Site Generation
Converting a dynamic FastAPI application into static files for GitHub Pages. Fetch shims, path rewriting, and the limits of pre-rendering.
Lesson 8Testing and Deployment
Three test directories, schema contracts, grain uniqueness, idempotence verification, CI configuration, and the full deployment checklist.
Lesson 9Choosing the Right Similarity Algorithm
Why Jaccard similarity produced 100% matches for every occupation pair, and how cosine similarity on importance scores fixed it.
Lesson 10Thread-Safe Database Connections
Why landing page stats intermittently showed zero, and how per-thread DuckDB connections fixed the concurrency bug.
Lesson 11Multi-Vintage Query Pitfalls
Three traps when querying across multiple data releases: duplicate rows, missing derived metrics, and percentages without absolute context.
Lesson 12UI-Data Alignment
Dead-end links, mixed time periods, and static site shim gaps. Three patterns for making the interface reflect what data actually exists.
Lesson 13Schema Drift Detection
Column adds, removes, and retypes between releases. How validation gates block bad data from reaching the warehouse.
Lesson 14Inflation Adjustment with CPI
CPI-U deflation formula, base year selection, and the SQL pattern for computing real wages from nominal observations.
Lesson 15Crosswalk & Taxonomy Evolution
SOC version changes, 1:1/split/merge/complex mappings, and why “same code” doesn’t mean “same occupation.”
Lesson 16Extract Patterns for Government APIs
Browser header spoofing, retry with exponential backoff, format heterogeneity, and manifest-driven downloads.
Lesson 17Derived Metrics from Base Observations
Why derived metrics need their own fact table. Self-join patterns for YoY, rolling averages, and idempotent computation.
Lesson 18Ranked Movers & Outlier Interpretation
Small-denominator volatility, dual-direction sorting, and why suppressed data causes apparent outliers in top-movers lists.
Lesson 19Geography Comparison Pitfalls
State-vs-national gaps, cost-of-living confounders, suppressed cells in small states, and dimension-driven filtering.
Lesson 20Fetch Shim Architecture
Turning a dynamic API into a static site. JavaScript fetch interception, URL-to-file mapping, and client-side search.