Methodology
JobClass integrates four federal data products into a unified analytical warehouse, with occupation as the stable external key.
Architecture
The pipeline follows a four-layer warehouse architecture:
- Raw/Landing — Immutable capture of downloaded source artifacts
- Standardized Staging — Parsed into relational structures with explicit typing
- Core Warehouse — Conformed dimensions, facts, and bridges with version-aware joins
- Analyst Marts — Denormalized views for specific analytical questions
Data Sources
| Source | Description | Role |
|---|---|---|
| SOC (Standard Occupational Classification) | Bureau of Labor Statistics | Occupation taxonomy backbone — hierarchy from major groups to detailed occupations |
| OEWS (Occupational Employment and Wage Statistics) | Bureau of Labor Statistics | Employment counts and wage measures by occupation, geography, and industry |
| O*NET (Occupational Information Network) | Department of Labor / O*NET Center | Semantic descriptors — skills, knowledge, abilities, and tasks for each occupation |
| Employment Projections | Bureau of Labor Statistics | 10-year employment outlook with growth rates and education requirements |
| CPI-U (Consumer Price Index for All Urban Consumers) | Bureau of Labor Statistics | Price index for inflation-adjusted wages; full hierarchy of items and areas |
| SOC Crosswalk (2010 ↔ 2018) | Bureau of Labor Statistics | Occupation code mappings between SOC taxonomy versions for historical depth |
Data Quality
- Idempotent loading — Re-running the same dataset version never creates duplicates
- Fail-fast on schema drift — Parser blocks publication if source format changes
- Preserved suppression — BLS suppressed values remain null, never imputed or shown as zero
- Source lineage — Every displayed value is traceable to its source release and version
Time-Series Analysis
The warehouse supports time-indexed analysis of occupation metrics. Key concepts:
- Comparability Mode — Two modes are maintained: As Published preserves the original values from each source release; Comparable History includes only observations from vintages sharing the same SOC taxonomy version, enabling valid trend comparisons.
- Base vs. Derived Metrics — Base metrics (employment count, wages) come directly from source data. Derived metrics (year-over-year change, rolling average, state-vs-national gap, rank delta) are computed from base observations and always labeled as derived.
- Projected Values — BLS Employment Projections are stored alongside observed values but are visually distinguished (dashed bars, different color) because they represent forward-looking estimates with different base years and assumptions.
- Revisions & Discontinuities — When BLS revises OEWS data or changes the SOC taxonomy, previously published values may not be directly comparable to new releases. The comparable-history mode excludes cross-version comparisons; the as-published mode preserves the original record.
CPI Domain
The CPI analytical domain elevates BLS Consumer Price Index data from an internal wage deflator into a first-class browsable domain. Key design decisions:
- Member vs. Series Variant — A CPI member (e.g., SA0 "All Items") is the stable concept. Series variants encode specific measurement choices: index family (CPI-U, CPI-W, C-CPI-U), seasonal adjustment (S/U), and geographic area.
- Hierarchy vs. Cross-Cutting — The BLS item hierarchy (All Items → Major Group →
Expenditure Class → Item Stratum) is a strict tree. Cross-cutting aggregates
(All Items Less Food & Energy, Energy, Commodities) are classified separately with
semantic_role = special_aggregate. - Area Rules — National and regional areas publish monthly. Smaller metro areas publish bimonthly. Area indexes do not measure price-level differences between cities.
- C-CPI-U Revisions — The Chained CPI initially releases preliminary values, revised over 10–12 months as final expenditure data become available. Preliminary and final values are tracked separately in the revision vintage table.
- Real-Wage Integration — CPI-U All Items (SA0) annual averages serve as the deflator
for real-wage metrics, using base year 2023. Formula:
real_wage = nominal × CPI2023 / CPIyear
Current Versions
Loading version info...
Pipeline Validation Status
Loading validation status...