Lessons Learned
Patterns, mistakes, and decisions extracted from building a labor market data warehouse. Each lesson is standalone and rooted in concrete examples from this codebase.
Design System Cross-Pollination
Porting visual feel between projects via design tokens without coupling architectures. Tokens first, typography second, dark mode site-wide.
The Federal Labor Data Landscape
Four federal data products, one occupation code. How SOC, OEWS, O*NET, and Employment Projections connect through the Standard Occupational Classification.
Dimensional Modeling for Labor Data
Facts, dimensions, bridges, and the four-layer architecture. Why separating measurements from context makes everything easier to query.
The Multi-Vintage Challenge
What breaks when you load multiple years of the same dataset, and how shared dimensions make cross-vintage time-series analysis possible.
Data Quality Traps in Government Sources
ZIP files with garbage inside, column names that change between releases, overlapping occupation groups, and suppressed values that aren't zeros.
Time-Series Normalization
Transforming point-in-time snapshots into proper time-series data. As-published vs. comparable history, and how derived metrics are computed.
Idempotent Pipeline Design
Why every load operation must be safely re-runnable. Delete-before-insert, check-before-insert, and grain uniqueness as the final safety net.
Static Site Generation
Converting a dynamic FastAPI application into static files for GitHub Pages. Fetch shims, path rewriting, and the limits of pre-rendering.
Testing and Deployment
Three test directories, schema contracts, grain uniqueness, idempotence verification, CI configuration, and the full deployment checklist.
Choosing the Right Similarity Algorithm
Why Jaccard similarity produced 100% matches for every occupation pair, and how cosine similarity on importance scores fixed it.
Thread-Safe Database Connections
Why landing page stats intermittently showed zero, and how per-thread DuckDB connections fixed the concurrency bug.
Multi-Vintage Query Pitfalls
Three traps when querying across multiple data releases: duplicate rows, missing derived metrics, and percentages without absolute context.
UI-Data Alignment
Dead-end links, mixed time periods, and static site shim gaps. Three patterns for making the interface reflect what data actually exists.
Schema Drift Detection
Column adds, removes, and retypes between releases. How validation gates block bad data from reaching the warehouse.
Inflation Adjustment with CPI
CPI-U deflation formula, base year selection, and the SQL pattern for computing real wages from nominal observations.
Crosswalk & Taxonomy Evolution
SOC version changes, 1:1/split/merge/complex mappings, and why “same code” doesn’t mean “same occupation.”
Extract Patterns for Government APIs
Browser header spoofing, retry with exponential backoff, format heterogeneity, and manifest-driven downloads.
Derived Metrics from Base Observations
Why derived metrics need their own fact table. Self-join patterns for YoY, rolling averages, and idempotent computation.
Ranked Movers & Outlier Interpretation
Small-denominator volatility, dual-direction sorting, and why suppressed data causes apparent outliers in top-movers lists.
Geography Comparison Pitfalls
State-vs-national gaps, cost-of-living confounders, suppressed cells in small states, and dimension-driven filtering.
Fetch Shim Architecture
Turning a dynamic API into a static site. JavaScript fetch interception, URL-to-file mapping, and client-side search.