--- title: "Hennepin example" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Hennepin example} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( eval = FALSE, message = FALSE, warning = FALSE, error = FALSE ) ``` ```{r libs} library(shellgame) library(geoDeltaAudit) library(dplyr) library(stringr) library(janitor) # vignette-only dependency; keep in Suggests if (!requireNamespace("readr", quietly = TRUE)) { stop("Package 'readr' is required to run this vignette. Install it with install.packages('readr').") } ``` # Introduction This vignette demonstrates a complete transformation audit using Hennepin County, Minnesota as an example. We'll track total population through the transformation chain: **ZCTA → ZIP → COUNTY** And reveal the shell game: same column name ("population"), different underlying quantity (observed → imputed). # The Workflow ## Step 1: Prepare the Data For this example, we'll use the data you would typically prepare: ```{r eval=FALSE} acs_path <- system.file("extdata", "toy_acs_zcta_hennepin.csv", package = "geoDeltaAudit") hud_path <- system.file("extdata", "toy_zip_county_hud_hennepin.csv", package = "geoDeltaAudit") stopifnot(nchar(acs_path) > 0, nchar(hud_path) > 0) acs <- readr::read_csv(acs_path, show_col_types = FALSE) |> janitor::clean_names() |> dplyr::mutate(zcta = stringr::str_pad(as.character(.data$zcta), 5, pad = "0")) hud <- readr::read_csv(hud_path, show_col_types = FALSE) |> janitor::clean_names() # Toy assoc: 1:1 ZCTA -> ZIP so the example always runs assoc <- acs |> dplyr::distinct(.data$zcta) |> dplyr::transmute(zcta = .data$zcta, zip = .data$zcta) |> dplyr::distinct() list( acs_rows = nrow(acs), assoc_rows = nrow(assoc), hud_rows = nrow(hud) ) ``` ## Step 2: Run the Audit ```{r run-audit, eval=FALSE, echo=TRUE} # example only (not executed during vignette build) result <- shellgame::evaluate_transformation( data = acs, zip_zcta_map = assoc, hud_crosswalk = hud, geo_col = "zcta", var_col = "pop" ) ``` ## Step 3: View Results ```{r eval=FALSE} # Print summary summary(result) ``` ## Membership Visualization *Note: The following graphics are pre-rendered from the configured Hennepin County example dataset to illustrate the spatial relationships being audited.* ```{r figures, echo=FALSE, eval=TRUE} knitr::include_graphics(c( "baseline_hennepin.png", "hennepin_relationship.png" )) ``` ```text === The Shell Game: Transformation Audit === Variable: population Target County: 27053 --- Baseline (Observed Data) --- Units: 74 ZCTAs Total: 1,391,557 --- After Transformation (Imputed Data) --- Intermediate: 98 ZIPs Recovered: 1,216,874 --- The Shell Game Result --- Perturbation: -174,683 (-12.6%) Same column name. Different underlying quantity. That's the shell game. --- Pre-Allocation Expansion --- 74 ZCTAs → 98 ZIPs (+32.4%) This happens BEFORE any allocation or weighting. The analytical surface has already shifted. --- Top Counties Receiving Perturbed Population --- 27003: 30,535 27139: 25,268 27123: 21,835 27171: 14,391 27059: 9,526 ``` ## Baseline: 74 ZCTAs The analysis begins with 74 ZCTAs that have a relationship-based membership with Hennepin County. These are the ZCTAs used by the Census Bureau in ACS tabulations. **Total population: 1,391,557** (directly observed from ACS) ```{r baseline-fig, echo=FALSE, eval=TRUE} knitr::include_graphics("baseline_hennepin.png") ``` ## The First Hop: ZCTA → ZIP When we associate these 74 ZCTAs with ZIP codes: - ZCTA 55401 → 8 ZIPs - ZCTA 55402 → 6 ZIPs - Several others → 2-5 ZIPs **Result: 74 ZCTAs become 98 ZIPs (+32.4%)** This happens **before any allocation**. The analytical surface has already shifted. ## The Second Hop: ZIP → County Using HUD's TOT_RATIO, we allocate ZIP-level population to counties. **Result: Population recovered for Hennepin County: 1,216,874** ## The Perturbation **174,683 people (-12.6%) disappeared in the transformation.** Where did they go? To neighboring counties: ```{r eval=FALSE} extract_perturbed_population(result, top_n = 5) ``` # Geometric vs Relationship Membership If we used geometric intersection instead of relationship-based membership, we would have 94 ZCTAs, not 74. **This is Decision #1**: How do we define membership? The 20 extra ZCTAs (shown in grey) intersect the county boundary geometrically but are not included in the relationship-based membership used by ACS. # Visualizing the Difference ```{r baseline-membership, echo=FALSE, eval=TRUE} knitr::include_graphics("baseline_hennepin.png") ``` The baseline: 74 ZCTAs with relationship-based membership. ```{r geometric-membership, echo=FALSE, eval=TRUE} knitr::include_graphics("hennepin_relationship.png") ``` The difference: Grey areas show ZCTAs that appear only under geometric intersection. # The Shell Game Revealed ```{r eval=FALSE} # Normalize expected fields from geoDeltaAudit::audit_transform() baseline_total <- as.numeric(audit_result$baseline_total) final_total <- as.numeric(audit_result$final_total) # delta is already provided; compute if missing delta <- if (!is.null(audit_result$delta)) { as.numeric(audit_result$delta) } else { final_total - baseline_total } absolute_perturbation <- abs(delta) ``` Same column name: "population" Different underlying quantity: observed → imputed **That's the shell game.** # Why This Matters This error is **agnostic** to: - **Variable**: Try this with median income (B19013_001) - same % error - **Tool**: Run this in Python or Stata - same % error - **Geography**: Try another county - same pattern ** Transformation is the cause, not the tool or variable.** # Next Steps - Try this audit with your own geography - Test with different ACS variables - Compare different crosswalk versions - Document your hidden decisions See `vignette("data-preparation")` for how to prepare your own data. See `vignette("conceptual_framework-shell-game")` for the conceptual explanation.