Paradox of the Day

One puzzle a day — logical, philosophical, statistical, temporal.

↑↑ → ↓

Simpson’s Paradox

Edward H. Simpson · 1951; earlier instances back to 1899

The puzzle

A trend that appears in groups of data can disappear or reverse when the groups are combined.
Domain
Statistics · Causation
Attribution
Edward H. Simpson
Date
1951; earlier instances back to 1899

Note

The classic instance: in 1973, UC Berkeley’s graduate admissions appeared to favor men. Broken down by department, almost every department favored women. Women were applying disproportionately to departments with low admit rates. The aggregate trend was real; the per-department trend was real; both were correct, and they pointed in opposite directions. The paradox is most often presented as a statistical curiosity, but the deeper problem is that almost every “controlled for” claim in observational data depends on choosing the right grouping — and the data alone cannot tell you which grouping is right. The choice is causal, not statistical, and you cannot make it without a theory of how the world works. Simpson published the canonical 1951 paper; Pearson and Yule had stumbled into the same structure half a century earlier without naming it.

Get one in your inbox each morning.

Email signup coming soon. Until then, the RSS feed works, or check back tomorrow.