--- title: "Geospatial Distribution Dynamics With griddy" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Geospatial Distribution Dynamics With griddy} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, message=FALSE, warning=FALSE} library(griddy) library(dplyr) library(ggplot2) library(sf) library(sfdep) library(spData) library(tidyr) ``` ```{r map-theme, include=FALSE} map_theme <- function() { theme_void(base_size = 11) + theme( legend.position = "bottom", plot.title.position = "plot", strip.text = element_text(face = "bold") ) } ``` `griddy` works on long spatial panels: one row per place per period, keyed by explicit `id`, `time`, and `value` columns. The bundled `usjoin` panel contains 48 contiguous US states, per-capita personal income, and annual observations from 1929 to 2009. It mirrors the canonical PySAL `giddy` example used in Rey (2001). ```{r} data(usjoin) usjoin ``` ## Classification Pooled quantile classification cuts the entire panel into five bins and reuses the same breakpoints for every year. That keeps class meanings comparable across periods, which matters because year-specific quantiles can hide absolute movement. A state in Q5 in 2009 is being compared to the same income scale as a state in Q5 in 1929, not merely to its contemporaries. ```{r} classes <- classify_dynamics(usjoin, name, year, income, k = 5) class_intervals(classes) ``` Nominal income growth means the endpoints are nearly homogeneous under pooled breaks: the early panel sits near the bottom of the pooled distribution and the late panel sits near the top. A mid-panel year is a better mapped diagnostic because it shows both the absolute class scale and the cross-state ordering. ```{r} states <- us_states |> st_drop_geometry() |> filter(NAME %in% usjoin$name) |> pull(NAME) geom <- us_states |> filter(NAME %in% usjoin$name) |> arrange(NAME) ``` ```{r, fig.alt="Map of contiguous US states showing pooled income classes in 1955 using the same five class intervals estimated from the full 1929 to 2009 panel.", fig.width=6.8, fig.height=4} class_map <- classes |> filter(year == 1955) |> left_join(geom |> select(NAME, geometry), by = c("name" = "NAME")) |> st_as_sf() ggplot(class_map) + geom_sf(aes(fill = class), color = "white", linewidth = 0.15) + scale_fill_viridis_d(option = "C", direction = -1, name = "Income class") + coord_sf(datum = NA) + labs( title = "Pooled classes remain map-ready", subtitle = "1955 uses breaks estimated from the full 1929-2009 panel" ) + map_theme() ``` ## Classic Markov The classic Markov matrix estimates movement between income classes from one year to the next, pooling transitions across all states and adjacent years. The `print()` method gives the transition probabilities; `transition_matrix()` can return either probabilities or raw counts for downstream work. ```{r, fig.alt="Heatmap of classic Markov transition probabilities for US state per-capita income, 1929 to 2009."} classic <- markov_dynamics(classes, name, year, class) classic transition_matrix(classic, "count") steady_state(classic) plot_transition_matrix(classic) ``` The diagonal dominance of the matrix — most states stay in their starting quintile from one year to the next — is the headline finding from this panel and recovers the result Rey (2001) reports. The stationary distribution is a mechanical summary of this empirical matrix, not a forecast. Here Q5 is absorbing in the observed one-year transitions, so the long-run vector puts all mass in Q5. ## Spatial Markov Spatial Markov conditions transition probabilities on the starting class of each unit's spatial lag. The expectation is that being surrounded by rich (or poor) neighbors shifts the transition odds beyond what the unit's own class would predict. ```{r} geom <- us_states |> filter(NAME %in% usjoin$name) |> arrange(NAME) |> mutate( nb = st_contiguity(geometry), wt = st_weights(nb) ) panel <- usjoin |> filter(name %in% states) |> arrange(name, year) ``` `geometry` is the preferred way to pass spatial weights: an `sf` tibble with one row per unit and `nb` / `wt` list-columns produced by `sfdep`. Carrying weights as columns on the geography keeps the spatial frame, the weights, and their row-standardization choice in one tidy object. ```{r, fig.alt="Faceted heatmaps of spatial Markov transition probabilities for US state per-capita income, by spatial-lag quintile."} spatial <- spatial_markov(panel, name, year, income, geometry = geom, k = 5) lag_intervals(spatial) transition_matrix(spatial, "probability", lag_class = "Q1") plot_spatial_markov(spatial) ``` Reading the facets left to right: states whose neighbors sit in the lowest income lag class (Q1) are much stickier at the bottom of the distribution than the pooled transition matrix would suggest. The spatial context conditioning is what spatial Markov is built to surface. Grey rows indicate lag-class/state combinations with no observed transitions, so their probabilities are undefined rather than zero. The lag classes themselves are map-ready. The map below uses 1994 because the spatial-lag distribution is split across two classes in that year; at the start of the panel, all states have Q1 spatial lags under pooled nominal-income breaks. Each state's class below is based on the weighted income of its neighbors, not on its own income. ```{r, fig.alt="Map of contiguous US states showing each state's 1994 spatial-lag income class based on neighboring states.", fig.width=6.8, fig.height=4} lag_map <- spatial$transitions |> filter(from_time == 1994) |> distinct(id, lag_class, spatial_lag) |> left_join(geom |> select(NAME, geometry), by = c("id" = "NAME")) |> st_as_sf() ggplot(lag_map) + geom_sf(aes(fill = lag_class), color = "white", linewidth = 0.15) + scale_fill_viridis_d(option = "C", direction = -1, name = "Spatial-lag class") + coord_sf(datum = NA) + labs( title = "Spatial Markov conditions on neighboring-state income", subtitle = "Spatial-lag class in 1994" ) + map_theme() ``` ## Rank Mobility Rank mobility is intentionally simple in v0.1: an endpoint comparison of unit ranks between the first and last observed periods. It is descriptive and map-ready, not the full PySAL rank-method suite. Positive values mean a state moved up the income ranking; negative values mean it moved down. ```{r, fig.alt="Map of endpoint rank mobility for US states, 1929 to 2009, with positive values indicating upward rank movement."} mobility_panel <- usjoin |> filter(name %in% states) |> left_join(geom |> select(NAME), by = c("name" = "NAME")) |> st_as_sf() mobility <- rank_mobility(mobility_panel, name, year, income) mobility |> st_drop_geometry() |> arrange(desc(abs_rank_change)) |> select(name, start_rank, end_rank, rank_change) |> head() plot_rank_mobility(mobility) + coord_sf(datum = NA) + labs( title = "Endpoint rank mobility, 1929 to 2009", subtitle = "Positive values indicate upward movement in the state income ranking" ) + map_theme() ``` For shorter-run churn, use adjacent-period comparisons. The same result schema is returned, but each row describes movement from one observed period to the next. ```{r} adjacent_mobility <- rank_mobility(panel, name, year, income, compare = "adjacent") adjacent_mobility |> arrange(desc(abs_rank_change)) |> select(name, year, to_time, rank_change) |> head() ``` ## Output schemas Each result object exposes a tidy `transitions` table and a long `matrix` table keyed by class labels. Inspecting them directly is the simplest way to learn the schema. ```{r} classic$transitions |> head() spatial$transitions |> select(id, from_time, to_time, lag_class, transition, spatial_lag) |> head() ``` For programmatic access, every griddy object also has `as.data.frame()` and the S3 generics `transition_matrix()`, `class_intervals()`, and (for spatial objects) `lag_intervals()`.