Multi-Vintage Query Pitfalls
The Core Problem
Once the warehouse holds multiple vintages of the same dataset (OEWS 2021, 2022, 2023), every query must explicitly decide: do I want the latest snapshot, or all history? Forgetting this decision produces silent data quality bugs.
Trap 1: Duplicate Rows in Snapshot Views
The wages comparison page (/occupation/{soc}/wages) showed 150 rows for
50 states — three rows per state, one per OEWS vintage. The page was designed
to show the latest snapshot, but the query had no vintage filter:
# BROKEN: returns all vintages
SELECT g.geo_name, f.mean_annual_wage, ...
FROM fact_occupation_employment_wages f
JOIN dim_geography g ON f.geography_key = g.geography_key
WHERE o.soc_code = '11-2021' AND g.geo_type = 'state'
# Result: Alabama (2021), Alabama (2022), Alabama (2023), Alaska (2021), ...
# 50 states × 3 vintages = 150 rows
# FIXED: latest vintage only
SELECT MAX(source_release_id) FROM fact_occupation_employment_wages
-- → '2023.05'
SELECT g.geo_name, f.mean_annual_wage, ...
WHERE ... AND f.source_release_id = '2023.05'
# Result: Alabama, Alaska, ... (50 rows)
Rule: Any endpoint that shows a point-in-time snapshot must
filter to a specific source_release_id. Only time-series and
trend endpoints should return multiple vintages.
Trap 2: Derived Metrics Tied to Comparability Mode
The Trend Explorer page showed YoY Change and YoY % as N/A for every
year. The calculations were correct — but they only existed in
comparable mode, while the UI defaulted to as_published.
# fact_derived_series stores YoY only for comparable mode:
comparability_mode = 'comparable' → yoy_change = 33420, yoy_pct = 15.69
comparability_mode = 'as_published' → (no rows)
# The API query correctly LEFT JOINs to derived series,
# but when comparability_mode = 'as_published', the join finds nothing → NULL
The fix was simple: change the default <select> option from
as_published to comparable. The data existed all along
— the UI was pointing at the wrong slice.
Rule: When derived metrics depend on a specific comparability mode, the UI must default to that mode. Otherwise users see empty columns and assume the feature is broken.
Trap 3: Percentages Without Absolute Context
The Ranked Movers page showed only YoY percent change. Without the absolute change, users couldn't distinguish meaningful moves from noise:
# Without absolute context:
Athletes and Sports Competitors: +206% ← How much is that?
Broadcast Announcers: +178% ← Meaningful or tiny base?
# With absolute context:
Athletes and Sports Competitors: +$241,150 (+206%) ← Massive jump
Broadcast Announcers: +$99,620 (+178%) ← Also large
The fix: join to yoy_absolute_change in the movers API query
and display both columns in the UI. Wage metrics format as dollars; employment
metrics format as counts.
Decision Guide: Latest vs. All Vintages
| Page / Endpoint | Vintage Policy | Filter |
|---|---|---|
| Occupation profile, wages comparison | Latest only | source_release_id = MAX(...) |
| Trend Explorer, time-series API | All vintages | No filter (ordered by year) |
| Ranked Movers | Latest derived only | Implicit (latest period in derived series) |
| Geography comparison | Specific year (user-selected or latest) | tp.year = ? |