Methodology

JobClass integrates four federal data products into a unified analytical warehouse, with occupation as the stable external key.

Architecture

The pipeline follows a four-layer warehouse architecture:

  1. Raw/Landing — Immutable capture of downloaded source artifacts
  2. Standardized Staging — Parsed into relational structures with explicit typing
  3. Core Warehouse — Conformed dimensions, facts, and bridges with version-aware joins
  4. Analyst Marts — Denormalized views for specific analytical questions

View in Pipeline Explorer →

Data Sources

SourceDescriptionRole
SOC (Standard Occupational Classification) Bureau of Labor Statistics Occupation taxonomy backbone — hierarchy from major groups to detailed occupations
OEWS (Occupational Employment and Wage Statistics) Bureau of Labor Statistics Employment counts and wage measures by occupation, geography, and industry
O*NET (Occupational Information Network) Department of Labor / O*NET Center Semantic descriptors — skills, knowledge, abilities, and tasks for each occupation
Employment Projections Bureau of Labor Statistics 10-year employment outlook with growth rates and education requirements
CPI-U (Consumer Price Index for All Urban Consumers) Bureau of Labor Statistics Price index for inflation-adjusted wages; full hierarchy of items and areas
SOC Crosswalk (2010 ↔ 2018) Bureau of Labor Statistics Occupation code mappings between SOC taxonomy versions for historical depth

Data Quality

  • Idempotent loading — Re-running the same dataset version never creates duplicates
  • Fail-fast on schema drift — Parser blocks publication if source format changes
  • Preserved suppression — BLS suppressed values remain null, never imputed or shown as zero
  • Source lineage — Every displayed value is traceable to its source release and version

Time-Series Analysis

The warehouse supports time-indexed analysis of occupation metrics. Key concepts:

  • Comparability Mode — Two modes are maintained: As Published preserves the original values from each source release; Comparable History includes only observations from vintages sharing the same SOC taxonomy version, enabling valid trend comparisons.
  • Base vs. Derived Metrics — Base metrics (employment count, wages) come directly from source data. Derived metrics (year-over-year change, rolling average, state-vs-national gap, rank delta) are computed from base observations and always labeled as derived.
  • Projected Values — BLS Employment Projections are stored alongside observed values but are visually distinguished (dashed bars, different color) because they represent forward-looking estimates with different base years and assumptions.
  • Revisions & Discontinuities — When BLS revises OEWS data or changes the SOC taxonomy, previously published values may not be directly comparable to new releases. The comparable-history mode excludes cross-version comparisons; the as-published mode preserves the original record.

CPI Domain

The CPI analytical domain elevates BLS Consumer Price Index data from an internal wage deflator into a first-class browsable domain. Key design decisions:

  • Member vs. Series Variant — A CPI member (e.g., SA0 "All Items") is the stable concept. Series variants encode specific measurement choices: index family (CPI-U, CPI-W, C-CPI-U), seasonal adjustment (S/U), and geographic area.
  • Hierarchy vs. Cross-Cutting — The BLS item hierarchy (All Items → Major Group → Expenditure Class → Item Stratum) is a strict tree. Cross-cutting aggregates (All Items Less Food & Energy, Energy, Commodities) are classified separately with semantic_role = special_aggregate.
  • Area Rules — National and regional areas publish monthly. Smaller metro areas publish bimonthly. Area indexes do not measure price-level differences between cities.
  • C-CPI-U Revisions — The Chained CPI initially releases preliminary values, revised over 10–12 months as final expenditure data become available. Preliminary and final values are tracked separately in the revision vintage table.
  • Real-Wage Integration — CPI-U All Items (SA0) annual averages serve as the deflator for real-wage metrics, using base year 2023. Formula: real_wage = nominal × CPI2023 / CPIyear

Current Versions

Loading version info...

Pipeline Validation Status

Loading validation status...