Causation occurs when one event genuinely produces or brings about another, and the single requirement accepted across every field of study is temporal precedence: the cause must happen before the effect. Beyond that baseline, establishing that something truly caused something else depends on the context you’re working in, whether that’s medicine, law, statistics, or everyday reasoning. Each field has developed its own tests, but they all share a common structure rooted in a few core principles.
The One Rule Every Framework Agrees On
Temporal precedence is the only criterion universally considered mandatory for causation. If event A didn’t happen before event B, A cannot have caused B. This sounds obvious, but it’s surprisingly easy to get wrong. People who exercise more tend to be healthier, but does exercise cause better health, or do healthier people simply exercise more? Sorting out which came first is the foundation of every causal claim.
The 18th-century philosopher David Hume laid out three conditions for concluding that A causes B: the two events must be close together in space and time (contiguity), A must precede B (priority), and A and B must repeatedly occur together (constant conjunction). Modern frameworks have built on these ideas, but Hume’s insight that we never directly observe causation, only patterns, still shapes how researchers think about the problem.
The Bradford Hill Criteria in Health and Medicine
In 1965, Sir Austin Bradford Hill proposed nine criteria for evaluating whether an environmental exposure actually causes a disease. He developed them while building the case that tobacco smoking causes lung cancer, and they remain the most widely used checklist in epidemiology. No single criterion (except temporality) is required, and no fixed number of criteria must be met. Instead, the more criteria an association satisfies, the stronger the case for causation.
The nine criteria are:
- Temporality: The exposure comes before the disease.
- Strength of association: A larger effect size makes a causal link more likely. If heavy smokers are 20 times more likely to develop lung cancer than nonsmokers, that’s harder to explain away than a twofold increase.
- Consistency: Different researchers find the same relationship in different populations and settings.
- Specificity: The exposure is linked to a particular outcome in a particular group, not vaguely associated with everything.
- Dose-response relationship: More exposure leads to more effect. In skin allergy research, for example, higher doses of an allergen reliably produce stronger reactions in both human and animal studies, and preventing high peak concentrations can prevent workers from developing respiratory allergies entirely.
- Biological plausibility: There’s a known mechanism that could explain how the exposure produces the outcome. Toxicology and molecular biology contribute this evidence independently from the statistical patterns.
- Coherence: The causal interpretation doesn’t conflict with what’s already known about the disease.
- Experiment: Removing or reducing the exposure changes the outcome.
- Analogy: A similar exposure is already known to cause a similar outcome.
Hill himself acknowledged that biological knowledge evolves, so plausibility is always limited by what scientists understand at the time. He also recognized that dose-response relationships aren’t always simple straight lines. Some effects have thresholds below which nothing happens, and some have complex curves where moderate exposure causes more harm than very high exposure.
How Medical Evidence Gets Graded
When medical organizations evaluate whether a treatment works or an exposure causes harm, they often use a system called GRADE to rate how confident they are in the evidence. Randomized controlled trials start at the highest confidence level, while observational studies start lower. But observational evidence can be upgraded based on three factors directly tied to causation: strength of association, dose-response, and whether any remaining biases would actually push the results in the opposite direction.
When well-conducted observational studies show both a strong association and a clear dose-response pattern, the evidence can be upgraded by two levels, from low to high certainty. However, if the evidence has already been downgraded for other problems like inconsistency or imprecision, upgrading it back up is generally not done, since that would overstate how reliable the findings actually are.
The Counterfactual Test
The most intuitive way to think about causation is the counterfactual: would the outcome have been different if the cause hadn’t happened? If you ate shrimp and broke out in hives, the causal question is whether you would have broken out in hives anyway, in some alternate reality where you skipped the shrimp.
In research, this is formalized as the potential outcomes framework. For each individual, a causal effect is defined as the difference between what happens with the exposure and what would have happened without it. The problem, of course, is that you can never observe both scenarios for the same person at the same time. Researchers get around this by comparing groups: one that received the exposure and one that didn’t, designed to be as similar as possible in every other way. The assumptions required for this to work include that the intervention is well-defined enough to produce a clear, unique outcome, and that the groups are truly comparable.
Three Levels of Causal Reasoning
The computer scientist Judea Pearl organized causal thinking into three increasingly powerful levels, sometimes called the Ladder of Causation. Each level can answer questions the one below it cannot.
The first level is association: observing patterns in data. This answers “what is?” questions. If people who drink coffee tend to live longer, that’s an association. It tells you nothing about whether coffee caused the longer life. The second level is intervention: what happens when you actively change something? This answers “what if?” questions. If you take a group of people and make them drink coffee, then track their health, you’re moving beyond pattern-watching into experimental evidence. The third level is counterfactual reasoning: imagining what would have happened under different circumstances. This answers “why?” questions and lets you reason backward. Was it the coffee that caused the improvement, or would it have happened anyway?
Most everyday data analysis operates at level one. Controlled experiments reach level two. Only causal models that can simulate alternative scenarios reach level three, and that’s where the deepest understanding of causation lives.
Causation in Legal Contexts
The law has its own test for causation, and it splits the question into two parts. The first is called “but-for” causation: but for the defendant’s actions, would the harm have occurred? This is essentially the counterfactual test applied to liability. If you ran a red light and hit a pedestrian, the but-for question asks whether the pedestrian would have been hit if you had stopped.
But-for causation alone isn’t enough for legal liability. The law also requires proximate cause, which asks whether the connection between the action and the harm is close enough to be fair. If you ran a red light, caused a minor fender-bender, and one of the drivers involved later slipped on ice in the hospital parking lot and broke a leg, your red-light violation technically set the chain of events in motion. But a court would likely say that the broken leg was too remote to hold you responsible. Some courts have moved away from the but-for test entirely and rely on proximate cause alone, holding defendants liable when their actions are closely enough related to the result.
Time-Series Data and Statistical Causation
In economics and other fields that track variables over time, the Granger causality test offers a specific, narrow definition of causation. Variable A “Granger-causes” variable B if knowing A’s past values helps predict B’s future values better than B’s past values alone. This doesn’t prove true causation in the philosophical sense. It proves predictive precedence.
The test has strict technical requirements. Both time series must be stationary, meaning their statistical properties don’t drift over time. The prediction errors must be random (not patterned), and for the standard statistical test to be valid, those errors need to follow a normal distribution. Researchers use information criteria to determine how far back in time the model should look. When these conditions are met, Granger causality provides a useful, if limited, tool for identifying which variables carry forward-looking information about others.
What Ties These Frameworks Together
Whether you’re a doctor evaluating whether a drug works, a lawyer proving negligence, or a scientist analyzing climate data, causation requires the same basic ingredients. The cause must precede the effect. The two must be reliably connected, not just coincidentally occurring together. And there must be no better explanation for the pattern you’re seeing. The specific tests differ because the stakes and the types of evidence differ, but the underlying logic is consistent: causation occurs when one event produces another, and you can demonstrate that the outcome wouldn’t have happened the same way without it.

