Simpson’s Paradox
Edward H. Simpson · 1951; earlier instances back to 1899
The puzzle
A trend that appears in groups of data can disappear or reverse when the groups are combined.
Note
The classic instance: in 1973, UC Berkeley’s graduate admissions appeared to favor men. Broken down by department, almost every department favored women. Women were applying disproportionately to departments with low admit rates. The aggregate trend was real; the per-department trend was real; both were correct, and they pointed in opposite directions. The paradox is most often presented as a statistical curiosity, but the deeper problem is that almost every “controlled for” claim in observational data depends on choosing the right grouping — and the data alone cannot tell you which grouping is right. The choice is causal, not statistical, and you cannot make it without a theory of how the world works. Simpson published the canonical 1951 paper; Pearson and Yule had stumbled into the same structure half a century earlier without naming it.