Time-Series Normalization — Lessons

Facts Are Snapshots, Not Series

The fact_occupation_employment_wages table stores one row per (occupation, geography, period, release). It records "in OEWS release 2023.05, occupation 11-1011 in the national scope had employment of 200,480." This is a snapshot — a single measurement at a single point in time.

To answer "how has employment for Chief Executives changed from 2021 to 2023?", you need to pull three snapshots and align them by time period. That's what time-series normalization does.

The Time-Series Schema

dim_time_period              — (period_key, year, period_type)
dim_metric                   — (metric_key, metric_name, display_name, unit)
fact_time_series_observation — (occupation_key, geography_key, period_key, metric_key, value, ...)
fact_derived_series          — (occupation_key, geography_key, period_key, metric_key, value, ...)

Base observations come from the fact table. Derived series (YoY change, rolling averages, state-vs-national gap) are computed from base observations. The two are stored separately because derived values should never be treated as source data.

Two Modes: As Published vs. Comparable

As Published preserves every observation exactly as it appeared in its source release. If BLS published employment = 200,480 in OEWS 2023.05, that value goes into the time series regardless of whether the SOC taxonomy changed between releases.

Comparable History only includes observations from vintages that share the same SOC taxonomy version. If OEWS 2021, 2022, and 2023 all use SOC 2018, all three are comparable. But if a future release uses SOC 2028 with different occupation definitions, comparing its employment count to a SOC 2018 count would be misleading.

Design decision: Both modes are always computed. The UI lets the user choose which view they want. This avoids forcing an opinion about comparability into the data layer.

Derived Metrics

From base observations, the pipeline computes:

Metric	Formula	Minimum Data Required
Year-over-year absolute change	`value(year) - value(year-1)`	2 consecutive years
Year-over-year percent change	`(value(year) - value(year-1)) / value(year-1) × 100`	2 consecutive years
3-year rolling average	`avg(value) over 3-year window`	3 consecutive years
State vs. national gap	`state_value - national_value`	State + national data
Rank delta	`rank(year-1) - rank(year)`	2 consecutive years

These derived values are labeled as such in the database and the UI. They are never mixed with source observations.