Does Correlation Imply Causation? What Scientists Say

No, correlation does not imply causation. A correlation between two variables means they move together in some pattern, but it tells you nothing about whether one actually causes the other. This is one of the most frequently repeated principles in statistics, yet it remains one of the most frequently violated in everyday reasoning, media headlines, and even published research.

What Correlation Actually Measures

Correlation is a statistical measure that describes the size and direction of a relationship between two variables. When ice cream sales go up and drowning deaths also go up, those two variables are positively correlated. The correlation coefficient, a number between -1 and +1, quantifies how tightly the two move together. A value near +1 means they rise and fall in sync; near -1 means one rises as the other falls; near 0 means there’s no consistent pattern.

That number captures pattern, not mechanism. It can’t tell you whether one variable is driving the other, whether something else is driving both, or whether the whole relationship is a coincidence. The correlation coefficient, by design, says nothing about cause and effect.

Why Correlated Things Often Share a Hidden Cause

The most common reason two unrelated things appear connected is a confounding variable: a third factor influencing both. Ice cream sales and home burglaries both rise in summer, not because buying ice cream inspires crime, but because warmer weather independently increases both. Temperature is the confounder hiding behind the correlation.

Confounders show up everywhere. One study found that adults who preferred beer tended to weigh more than those who preferred wine. The lurking variable was gender identity: men were more likely to prefer beer and also tend to weigh more on average. Without accounting for that, you’d wrongly conclude that beer itself causes higher body weight relative to wine.

Sometimes there’s no hidden cause at all. The relationship is pure coincidence amplified by large datasets. The number of films Nicolas Cage appeared in each year once tracked closely with the number of female editors of the Harvard Law Review. Per capita cheese consumption in the U.S. correlated with the number of people who died by becoming tangled in their bedsheets. Margarine consumption correlated with divorce rates. None of these relationships mean anything. With enough variables and enough years of data, random patterns inevitably emerge that look convincing on a chart.

The Smoking Problem: How Scientists Proved Causation

The gap between correlation and causation isn’t just an academic puzzle. It has life-or-death consequences, and the tobacco debate is the clearest example. By the mid-20th century, researchers had strong correlational data linking cigarette smoking to lung cancer. The tobacco industry argued, correctly in a narrow technical sense, that correlation alone couldn’t prove cigarettes caused cancer. Maybe smokers shared some other trait that made them cancer-prone.

The 1964 U.S. Surgeon General’s Report broke through that argument by assembling multiple independent lines of evidence. Researchers showed that smoking preceded cancer (not the reverse), that heavier smokers got cancer at higher rates (a dose-response relationship), that the link appeared consistently across different populations and study designs, and that the chemicals in cigarette smoke could plausibly damage lung tissue. Today, smoking is known to cause 87 percent of lung cancer deaths and 79 percent of all cases of chronic obstructive pulmonary disease in the United States. At least 70 chemicals in cigarette smoke are confirmed carcinogens.

The case against tobacco wasn’t built on any single study. It was built on a framework for evaluating when correlation points toward real causation.

The Checklist Scientists Use to Evaluate Causation

In 1965, epidemiologist Austin Bradford Hill laid out a set of considerations, now called the Bradford Hill criteria, that researchers still use to judge whether a correlation likely reflects a true cause-and-effect relationship. No single criterion is sufficient on its own, and not every criterion needs to be met, but the more that are satisfied, the stronger the case.

Strength: A large effect is harder to explain away as a fluke or a confounding artifact than a tiny one.
Consistency: The relationship shows up repeatedly, in different populations, at different times, using different methods.
Temporality: The proposed cause must come before the effect. This is the one non-negotiable criterion. If A doesn’t precede B, A cannot cause B.
Dose-response: More exposure leads to more of the outcome, following a predictable pattern.
Plausibility: There’s a credible biological or physical mechanism that could explain how A leads to B.
Coherence: The causal claim doesn’t contradict what’s already well established in the field.
Experiment: Evidence from controlled experiments, where researchers deliberately manipulate the variable, strengthens the case considerably.
Specificity: The exposure leads to a particular outcome rather than a vague collection of effects.
Analogy: Similar exposures are already known to produce similar outcomes.

Bradford Hill himself cautioned that no statistical test, however significant, can answer the question of causation on its own. These criteria are a judgment framework, not a formula.

Why Statistical Significance Doesn’t Help

A common misunderstanding is that a “statistically significant” result (typically a p-value below 0.05) means the relationship is real and causal. It doesn’t. A p-value tells you how likely it is that the observed pattern would appear by chance alone if there were truly no relationship. That’s a much narrower statement than most people assume.

A small p-value can arise from flawed study design, biased data collection, or a model that doesn’t fit the data well. The 0.05 cutoff itself is arbitrary, a convention born of tradition and convenience rather than any deep mathematical law. Statistical significance was originally proposed as a signal that a result deserved further scrutiny. Over time, it came to be treated as proof, which it never was. As Hill put it, formal tests of significance “contribute nothing to the ‘proof’ of our hypothesis” beyond reminding us what chance can produce.

How Researchers Actually Establish Cause and Effect

The gold standard for demonstrating causation is the randomized controlled trial. In an RCT, participants are randomly assigned to either receive an intervention or not. Randomization is the key ingredient: it balances both known and unknown characteristics between the two groups, so any difference in outcomes can reasonably be attributed to the intervention itself rather than to some lurking variable.

Good RCTs also use blinding, where participants (and sometimes researchers) don’t know who’s receiving the intervention. This prevents expectations from influencing results. No single study proves causality outright, but a well-designed RCT comes closer than any other method.

RCTs aren’t always possible, though. You can’t randomly assign people to smoke for 30 years or to live in poverty. When experiments are off the table, researchers turn to observational data and lean on the Bradford Hill criteria, natural experiments, and statistical techniques that try to mimic randomization after the fact. In time-series data, a method called Granger causality tests whether knowing the history of one variable improves predictions of another. It’s used in economics, neuroscience, and climate science, but even its creator was careful to note it measures predictive power, not true causation in the physical sense.

How to Spot the Mistake in Everyday Life

The logical error of assuming correlation equals causation has a Latin name: “cum hoc ergo propter hoc,” meaning “with this, therefore because of this.” You’ll encounter it constantly once you know what to look for.

When hormone replacement therapy became widespread, doctors noticed that women taking HRT seemed to have lower rates of coronary heart disease. The correlation was real. But it turned out that women who chose HRT tended to be wealthier, more health-conscious, and more likely to exercise. When randomized trials were finally conducted, HRT actually increased heart disease risk slightly. The observational correlation had pointed in the exact opposite direction of the truth.

Whenever you see a headline claiming that one thing “is linked to” another, ask three questions. Could something else be causing both? Which came first? And was this an experiment with random assignment, or just a pattern someone noticed in existing data? Those three questions will filter out most of the noise. A correlation is a starting point for investigation, never an endpoint.