---
title: "Geospatial Distribution Dynamics With griddy"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Geospatial Distribution Dynamics With griddy}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r, message=FALSE, warning=FALSE}
library(griddy)
library(dplyr)
library(ggplot2)
library(sf)
library(sfdep)
library(spData)
library(tidyr)
```

```{r map-theme, include=FALSE}
map_theme <- function() {
  theme_void(base_size = 11) +
    theme(
      legend.position = "bottom",
      plot.title.position = "plot",
      strip.text = element_text(face = "bold")
    )
}
```

`griddy` works on long spatial panels: one row per place per period, keyed by
explicit `id`, `time`, and `value` columns. The bundled `usjoin` panel contains
48 contiguous US states, per-capita personal income, and annual observations
from 1929 to 2009. It mirrors the canonical PySAL `giddy` example used in
Rey (2001).

```{r}
data(usjoin)

usjoin
```

## Classification

Pooled quantile classification cuts the entire panel into five bins and reuses
the same breakpoints for every year. That keeps class meanings comparable
across periods, which matters because year-specific quantiles can hide absolute
movement. A state in Q5 in 2009 is being compared to the same income scale as
a state in Q5 in 1929, not merely to its contemporaries.

```{r}
classes <- classify_dynamics(usjoin, name, year, income, k = 5)

class_intervals(classes)
```

Nominal income growth means the endpoints are nearly homogeneous under pooled
breaks: the early panel sits near the bottom of the pooled distribution and the
late panel sits near the top. A mid-panel year is a better mapped diagnostic
because it shows both the absolute class scale and the cross-state ordering.

```{r}
states <- us_states |>
  st_drop_geometry() |>
  filter(NAME %in% usjoin$name) |>
  pull(NAME)

geom <- us_states |>
  filter(NAME %in% usjoin$name) |>
  arrange(NAME)
```

```{r, fig.alt="Map of contiguous US states showing pooled income classes in 1955 using the same five class intervals estimated from the full 1929 to 2009 panel.", fig.width=6.8, fig.height=4}
class_map <- classes |>
  filter(year == 1955) |>
  left_join(geom |> select(NAME, geometry), by = c("name" = "NAME")) |>
  st_as_sf()

ggplot(class_map) +
  geom_sf(aes(fill = class), color = "white", linewidth = 0.15) +
  scale_fill_viridis_d(option = "C", direction = -1, name = "Income class") +
  coord_sf(datum = NA) +
  labs(
    title = "Pooled classes remain map-ready",
    subtitle = "1955 uses breaks estimated from the full 1929-2009 panel"
  ) +
  map_theme()
```

## Classic Markov

The classic Markov matrix estimates movement between income classes from one
year to the next, pooling transitions across all states and adjacent years.
The `print()` method gives the transition probabilities; `transition_matrix()`
can return either probabilities or raw counts for downstream work.

```{r, fig.alt="Heatmap of classic Markov transition probabilities for US state per-capita income, 1929 to 2009."}
classic <- markov_dynamics(classes, name, year, class)

classic
transition_matrix(classic, "count")
steady_state(classic)

plot_transition_matrix(classic)
```

The diagonal dominance of the matrix — most states stay in their starting
quintile from one year to the next — is the headline finding from this panel
and recovers the result Rey (2001) reports. The stationary distribution is a
mechanical summary of this empirical matrix, not a forecast. Here Q5 is
absorbing in the observed one-year transitions, so the long-run vector puts all
mass in Q5.

## Spatial Markov

Spatial Markov conditions transition probabilities on the starting class of
each unit's spatial lag. The expectation is that being surrounded by rich (or
poor) neighbors shifts the transition odds beyond what the unit's own class
would predict.

```{r}
geom <- us_states |>
  filter(NAME %in% usjoin$name) |>
  arrange(NAME) |>
  mutate(
    nb = st_contiguity(geometry),
    wt = st_weights(nb)
  )

panel <- usjoin |>
  filter(name %in% states) |>
  arrange(name, year)
```

`geometry` is the preferred way to pass spatial weights: an `sf` tibble with
one row per unit and `nb` / `wt` list-columns produced by `sfdep`. Carrying
weights as columns on the geography keeps the spatial frame, the weights, and
their row-standardization choice in one tidy object.

```{r, fig.alt="Faceted heatmaps of spatial Markov transition probabilities for US state per-capita income, by spatial-lag quintile."}
spatial <- spatial_markov(panel, name, year, income, geometry = geom, k = 5)

lag_intervals(spatial)
transition_matrix(spatial, "probability", lag_class = "Q1")

plot_spatial_markov(spatial)
```

Reading the facets left to right: states whose neighbors sit in the lowest
income lag class (Q1) are much stickier at the bottom of the distribution than
the pooled transition matrix would suggest. The spatial context conditioning is
what spatial Markov is built to surface. Grey rows indicate lag-class/state
combinations with no observed transitions, so their probabilities are undefined
rather than zero.

The lag classes themselves are map-ready. The map below uses 1994 because the
spatial-lag distribution is split across two classes in that year; at the start
of the panel, all states have Q1 spatial lags under pooled nominal-income
breaks. Each state's class below is based on the weighted income of its
neighbors, not on its own income.

```{r, fig.alt="Map of contiguous US states showing each state's 1994 spatial-lag income class based on neighboring states.", fig.width=6.8, fig.height=4}
lag_map <- spatial$transitions |>
  filter(from_time == 1994) |>
  distinct(id, lag_class, spatial_lag) |>
  left_join(geom |> select(NAME, geometry), by = c("id" = "NAME")) |>
  st_as_sf()

ggplot(lag_map) +
  geom_sf(aes(fill = lag_class), color = "white", linewidth = 0.15) +
  scale_fill_viridis_d(option = "C", direction = -1, name = "Spatial-lag class") +
  coord_sf(datum = NA) +
  labs(
    title = "Spatial Markov conditions on neighboring-state income",
    subtitle = "Spatial-lag class in 1994"
  ) +
  map_theme()
```

## Rank Mobility

Rank mobility is intentionally simple in v0.1: an endpoint comparison of unit
ranks between the first and last observed periods. It is descriptive and
map-ready, not the full PySAL rank-method suite. Positive values mean a state
moved up the income ranking; negative values mean it moved down.

```{r, fig.alt="Map of endpoint rank mobility for US states, 1929 to 2009, with positive values indicating upward rank movement."}
mobility_panel <- usjoin |>
  filter(name %in% states) |>
  left_join(geom |> select(NAME), by = c("name" = "NAME")) |>
  st_as_sf()

mobility <- rank_mobility(mobility_panel, name, year, income)

mobility |>
  st_drop_geometry() |>
  arrange(desc(abs_rank_change)) |>
  select(name, start_rank, end_rank, rank_change) |>
  head()

plot_rank_mobility(mobility) +
  coord_sf(datum = NA) +
  labs(
    title = "Endpoint rank mobility, 1929 to 2009",
    subtitle = "Positive values indicate upward movement in the state income ranking"
  ) +
  map_theme()
```

For shorter-run churn, use adjacent-period comparisons. The same result schema
is returned, but each row describes movement from one observed period to the
next.

```{r}
adjacent_mobility <- rank_mobility(panel, name, year, income, compare = "adjacent")

adjacent_mobility |>
  arrange(desc(abs_rank_change)) |>
  select(name, year, to_time, rank_change) |>
  head()
```

## Output schemas

Each result object exposes a tidy `transitions` table and a long `matrix` table
keyed by class labels. Inspecting them directly is the simplest way to learn
the schema.

```{r}
classic$transitions |> head()
spatial$transitions |> select(id, from_time, to_time, lag_class, transition, spatial_lag) |> head()
```

For programmatic access, every griddy object also has `as.data.frame()` and the
S3 generics `transition_matrix()`, `class_intervals()`, and (for spatial
objects) `lag_intervals()`.