A correlation is most likely a causation when the suspected cause comes before the effect, when higher doses produce stronger effects, when a plausible biological or physical mechanism connects the two, and when the link holds up across different populations and study designs. No single checkmark proves causation, but the more of these criteria a correlation satisfies, the stronger the case becomes. The framework scientists use to make this judgment has been refined over decades, and the classic example that anchors it all is the link between smoking and lung cancer.
The Five Signs a Correlation Is Probably Causal
In 1965, epidemiologist Austin Bradford Hill laid out a set of criteria still used today to evaluate whether a statistical association reflects a true cause-and-effect relationship. Nine criteria made the original list, but five carry the most practical weight.
The cause comes first. This sounds obvious, but it’s the one non-negotiable requirement. If you can’t establish that X happened before Y, you can’t claim X caused Y. With slow-developing diseases, this gets tricky. Did the dietary pattern precede the illness, or did early, undetected illness change the person’s diet? Temporality is the only criterion Hill considered absolutely essential.
More exposure means more effect. When the death rate from lung cancer rises linearly with the number of cigarettes smoked per day, that dose-response pattern is powerful evidence. A coincidental correlation rarely scales so neatly. If one glass of a contaminated drink makes you mildly sick and three glasses make you very sick, the link between the contaminant and the illness becomes hard to dismiss.
The effect is large. A strong association is harder to explain away as a fluke or the result of some hidden variable. Average smokers in the 1960s had a nine- to ten-fold increased risk of developing lung cancer compared to nonsmokers. Heavy smokers faced at least a twenty-fold risk. An effect that large demands an explanation, and no competing variable could plausibly account for it without being easy to detect on its own.
A mechanism makes sense. When scientists can explain how X physically or biologically produces Y, the case strengthens considerably. You don’t always need this, as Hill himself noted that what counts as “biologically plausible” depends on what science knows at the time. But having a clear mechanism narrows the field. It helps researchers design better studies by focusing on the right measurements and the most susceptible groups.
Different studies reach the same conclusion. Hill put a good deal of weight on “similar results reached in quite different ways.” If a prospective study following people forward in time, a retrospective study looking backward, and a lab study in animals all point the same direction, that consistency across methods is much harder to attribute to a shared flaw.
How Smoking Became the Textbook Case
For decades, tobacco companies argued that the link between smoking and cancer might be explained by a confounding variable: maybe smokers also drank more alcohol, or had some genetic predisposition that independently drove both the smoking habit and the cancer. This is actually a reasonable scientific objection in principle. The reason it ultimately failed is that smoking met virtually every causal criterion at once.
Epidemiologists used large-scale, long-term surveys to establish that smoking preceded lung cancer by years. Pathologists confirmed the statistical relationship in lab studies. The risk climbed with the number of cigarettes smoked daily (dose-response) and fell after people quit (reversibility). The 1964 Surgeon General’s report held cigarette smoking responsible for a 70 percent increase in overall mortality among smokers compared to nonsmokers. No other environmental factor, not air pollution, asbestos, or radioactive materials, could account for the epidemic rise of lung cancer in the twentieth century.
This case illustrates a key principle: causation is established by accumulating multiple lines of evidence, not by any single study.
What Makes Correlations Misleading
The most common trap is the confounding variable, a third factor that drives both the apparent cause and the apparent effect. The classic example: ice cream sales and drowning rates rise together in summer. Ice cream doesn’t cause drowning. Hot weather drives both. When you account for temperature, the correlation between ice cream and drowning disappears.
Reverse causation is another pitfall. A study might find that people who exercise less have higher rates of depression. But does inactivity cause depression, or does depression cause inactivity? Without establishing which came first (temporality), you can’t tell the cart from the horse, as Hill put it.
Then there’s simple coincidence at scale. If you test thousands of variable pairs, some will correlate by pure chance. This is why replication matters. A finding that appears once in one dataset is interesting. The same finding appearing across multiple populations, study designs, and research groups starts to look causal.
The Counterfactual Test
At its core, causation comes down to one question: would the outcome have happened if the exposure had not occurred? This is called the counterfactual framework, and it traces back to the philosopher David Hume in the 18th century. He defined a cause as “an object followed by another, where, if the first object had not been, the second never had existed.”
The problem is that you can never directly observe the counterfactual. You can’t watch the same person both smoke and not smoke for 40 years. Randomized controlled trials solve this by creating two comparable groups, one exposed and one not, so that any difference in outcomes can be attributed to the exposure rather than to background differences between people. This is why RCTs sit at the top of the evidence hierarchy: they rule out more confounders than any other study design.
But you can’t always run an experiment. You can’t randomly assign people to smoke for decades. In those situations, researchers use a technique called Mendelian randomization, which takes advantage of the fact that genetic variants are randomly assigned at conception. If a gene that makes people metabolize a substance differently also predicts their disease risk, that’s strong evidence the substance itself plays a causal role, because genes aren’t influenced by lifestyle, income, or any of the usual confounders. The alleles are set before any exposure or outcome occurs, effectively mimicking the random assignment of a controlled trial.
How to Evaluate a Claim Yourself
When you encounter a headline claiming that X causes Y, you can run through a quick mental checklist. First, does the timeline work? Did the suspected cause clearly precede the effect? Second, is there a dose-response pattern, where more of X leads to more of Y? Third, is the effect large enough to be hard to explain by coincidence or confounding? Fourth, is there a plausible mechanism connecting the two? And fifth, have independent researchers found the same result using different methods?
A correlation that checks all five boxes is very likely causal. A correlation that checks only one or two, especially if the effect is small and the mechanism is unclear, deserves skepticism. Most of the spurious correlations that circulate online (per capita cheese consumption tracking with death by bedsheet tangling, for instance) fail on mechanism, dose-response, and consistency simultaneously. They exist because large datasets generate coincidental patterns at high volume.
The correlations most likely to be causal are the ones that survive every attempt to explain them away: the effect is big, it scales with exposure, it makes biological sense, it shows up in different populations, and removing the cause reduces the effect. That convergence of evidence is what separates a statistical curiosity from a genuine cause.

