The Federal Labor Data Landscape
The Occupation Code Is Everything
Every data product in this system revolves around the Standard Occupational Classification (SOC)
code. A SOC code like 11-1011 identifies "Chief Executives" — the 11
is the major group (Management), 1011 narrows to the specific occupation. There are
roughly 870 detailed occupations in SOC 2018.
The SOC taxonomy is published by BLS and updated roughly every 10 years. The current version is SOC 2018. This matters because when the taxonomy changes, occupation codes can split, merge, or disappear — breaking any naive time-series comparison.
Four Data Products, One Key
| Source | Publisher | What It Contains | Update Frequency |
|---|---|---|---|
| SOC | BLS | Occupation hierarchy: major → minor → broad → detailed | ~10 years |
| OEWS | BLS | Employment counts and wage statistics by occupation, geography, and industry | Annual (May reference period) |
| O*NET | DOL / O*NET Center | Skills, knowledge, abilities, and tasks for each occupation | Continuous (versioned releases) |
| Employment Projections | BLS | 10-year employment outlook with growth rates | Every 2 years |
How They Connect
SOC (taxonomy backbone)
├── OEWS links via occupation_code → how many people, how much they earn
├── O*NET links via SOC code → what skills and tasks the job requires
└── Projections link via SOC code → where employment is heading
The SOC taxonomy must load first. Every other data product joins to dim_occupation
through the SOC code. If a source uses a code that doesn't exist in the loaded SOC version,
that row is excluded (this is expected for ~5 NEM 2024 codes that don't map to SOC 2018).
Why This Matters
Key insight: You cannot treat these as four independent datasets. They are four views of the same occupational reality, and the SOC code is the thread that connects them. Designing the warehouse around this fact — occupation as the stable external key — is the single most important architectural decision in this project.