Lessons Learned

Patterns, mistakes, and decisions extracted from building a labor market data warehouse. Each lesson is standalone and rooted in concrete examples from this codebase.

21Lessons
5Categories
Lesson 01 Architecture

Design System Cross-Pollination

Porting visual feel between projects via design tokens without coupling architectures. Tokens first, typography second, dark mode site-wide.

Architecture & visual design
Lesson 02 Data

The Federal Labor Data Landscape

Four federal data products, one occupation code. How SOC, OEWS, O*NET, and Employment Projections connect through the Standard Occupational Classification.

Data sources & relationships
Lesson 03 Architecture

Dimensional Modeling for Labor Data

Facts, dimensions, bridges, and the four-layer architecture. Why separating measurements from context makes everything easier to query.

Architecture & data modeling
Lesson 04 Architecture

The Multi-Vintage Challenge

What breaks when you load multiple years of the same dataset, and how shared dimensions make cross-vintage time-series analysis possible.

Multi-year data integration
Lesson 05 Data

Data Quality Traps in Government Sources

ZIP files with garbage inside, column names that change between releases, overlapping occupation groups, and suppressed values that aren't zeros.

Data quality & defensive parsing
Lesson 06 Pipeline

Time-Series Normalization

Transforming point-in-time snapshots into proper time-series data. As-published vs. comparable history, and how derived metrics are computed.

Time-series design & analysis
Lesson 07 Architecture

Idempotent Pipeline Design

Why every load operation must be safely re-runnable. Delete-before-insert, check-before-insert, and grain uniqueness as the final safety net.

Pipeline reliability patterns
Lesson 08 Web

Static Site Generation

Converting a dynamic FastAPI application into static files for GitHub Pages. Fetch shims, path rewriting, and the limits of pre-rendering.

Deployment & static generation
Lesson 09 Testing

Testing and Deployment

Three test directories, schema contracts, grain uniqueness, idempotence verification, CI configuration, and the full deployment checklist.

Testing strategy & CI/CD
Lesson 10 Testing

Choosing the Right Similarity Algorithm

Why Jaccard similarity produced 100% matches for every occupation pair, and how cosine similarity on importance scores fixed it.

Algorithm selection & data structure
Lesson 11 Testing

Thread-Safe Database Connections

Why landing page stats intermittently showed zero, and how per-thread DuckDB connections fixed the concurrency bug.

Concurrency & web server architecture
Lesson 12 Testing

Multi-Vintage Query Pitfalls

Three traps when querying across multiple data releases: duplicate rows, missing derived metrics, and percentages without absolute context.

Querying across multiple data releases
Lesson 13 Web

UI-Data Alignment

Dead-end links, mixed time periods, and static site shim gaps. Three patterns for making the interface reflect what data actually exists.

UI design & data availability
Lesson 14 Data

Schema Drift Detection

Column adds, removes, and retypes between releases. How validation gates block bad data from reaching the warehouse.

Data validation & drift detection
Lesson 15 Data

Inflation Adjustment with CPI

CPI-U deflation formula, base year selection, and the SQL pattern for computing real wages from nominal observations.

Real wages & economic adjustment
Lesson 16 Architecture

Crosswalk & Taxonomy Evolution

SOC version changes, 1:1/split/merge/complex mappings, and why “same code” doesn’t mean “same occupation.”

Occupation code versioning
Lesson 17 Data

Extract Patterns for Government APIs

Browser header spoofing, retry with exponential backoff, format heterogeneity, and manifest-driven downloads.

Data extraction & HTTP patterns
Lesson 18 Pipeline

Derived Metrics from Base Observations

Why derived metrics need their own fact table. Self-join patterns for YoY, rolling averages, and idempotent computation.

Derived series & computation
Lesson 19 Pipeline

Ranked Movers & Outlier Interpretation

Small-denominator volatility, dual-direction sorting, and why suppressed data causes apparent outliers in top-movers lists.

Statistical interpretation
Lesson 20 Web

Geography Comparison Pitfalls

State-vs-national gaps, cost-of-living confounders, suppressed cells in small states, and dimension-driven filtering.

Geographic data challenges
Lesson 21 Web

Fetch Shim Architecture

Turning a dynamic API into a static site. JavaScript fetch interception, URL-to-file mapping, and client-side search.

Static site & deployment