What Is a Causal Relationship and How Do You Prove It?

A causal relationship exists when one event or condition directly produces a change in another. In formal terms, causation means an initial event (the exposure) affects the probability of a subsequent event (the outcome) occurring. This sounds simple, but proving that one thing actually causes another, rather than just happens alongside it, is one of the hardest problems in science and statistics.

The Three Requirements for Causation

For scientists to accept that A causes B, three conditions generally need to be met. First, A must come before B in time. This is called temporal precedence, and it’s the most widely accepted requirement. If you claim a medication caused a side effect, the patient must have taken the medication before the side effect appeared.

Second, A and B must move together in a predictable pattern. When A is present, B should be more (or less) likely. When A is absent, B should change accordingly. This is covariation. If smoking causes lung cancer, you’d expect smokers to develop lung cancer at higher rates than nonsmokers.

Third, the relationship can’t be explained away by something else. This is where things get tricky. If a hidden third factor is actually driving both A and B, the relationship between them is spurious, not causal. Ruling out these alternative explanations is often the most difficult part of establishing causation.

How Causation Differs From Correlation

Correlation is a statistical measure that describes how two variables move in relation to each other. It’s expressed as a number between -1.0 and +1.0. A positive value means both variables increase together. A negative value means one goes up while the other goes down. A value of zero means no relationship at all.

But here’s the critical point: correlation only tells you that two things are related. It says nothing about whether one caused the other. Smoking is correlated with heavy alcohol use, for example, but smoking doesn’t cause alcoholism. The two behaviors share common risk factors, like stress or social environment, that can drive both. A correlation coefficient, no matter how strong, cannot by itself prove cause and effect.

You can find correlations between all sorts of variables that have nothing to do with each other. Ice cream sales and drowning deaths both rise in summer, not because ice cream is dangerous, but because hot weather independently drives both. This is exactly why scientists spend enormous effort distinguishing genuine causal links from coincidental patterns.

Confounding: The Hidden Third Variable

A confounding variable is a factor that influences both the supposed cause and the outcome, creating the illusion of a direct link between them. Confounding is common in observational studies, where researchers watch what happens naturally rather than controlling conditions. In medical research, for instance, patients with a worse prognosis often end up receiving different treatments than healthier patients. If you just compare outcomes between the two groups, you might wrongly conclude the treatment itself made the difference, when the patients’ underlying health was driving the results all along.

To qualify as a confounder, a variable must be associated with both the exposure and the outcome, and it can’t simply be a downstream consequence of either one. Researchers handle confounders through techniques like matching, where they select comparison groups that are as similar as possible on all the important characteristics except the one being studied. Once those groups are “fair,” any remaining difference in outcomes can more credibly be attributed to the treatment or exposure in question.

The Post Hoc Fallacy

One of the most common errors in causal reasoning has a Latin name: post hoc ergo propter hoc, meaning “after this, therefore because of this.” It’s the mistake of assuming that because event A happened before event B, A must have caused B.

The classic illustration is the birthday example: nearly everyone turns 18 before graduating high school, but turning 18 doesn’t cause graduation. The timing is coincidental, not causal. A more consequential version of this fallacy has fueled some vaccine fears. Children typically receive a series of vaccinations in their first two years of life, and the early signs of autism tend to appear around the same age. The overlap in timing led some people to conclude vaccines caused autism, but extensive research has shown no causal link. The symptoms simply emerge at the same developmental stage that vaccinations happen to be scheduled.

How Scientists Prove Causation

Randomized controlled trials are considered the gold standard for establishing causal relationships. In an RCT, participants are randomly assigned to either receive a treatment or not. With a large enough sample, randomization ensures the two groups are similar in every meaningful way except for the treatment itself. This eliminates confounding up to the point of randomization and isolates the treatment’s true effect.

When experiments aren’t possible (you can’t randomly assign people to smoke for 30 years), researchers rely on a set of criteria originally proposed by the epidemiologist Austin Bradford Hill in the 1960s. These nine criteria help evaluate whether an observed association is likely causal:

  • Strength: A larger effect size makes causation more plausible.
  • Consistency: The same relationship shows up across different populations, settings, and time periods.
  • Specificity: The exposure leads to a particular outcome, not a vague collection of effects.
  • Temporality: The cause precedes the effect. This is the only criterion considered absolutely essential.
  • Dose-response: More exposure leads to a greater effect. Heavier smokers, for example, face higher lung cancer risk than light smokers.
  • Plausibility: A believable biological or physical mechanism can explain how the cause produces the effect.
  • Coherence: The causal interpretation doesn’t conflict with what’s already known about the disease or phenomenon.
  • Experiment: Experimental or semi-experimental evidence supports the relationship.
  • Analogy: Similar causes are already known to produce similar effects.

No single criterion is sufficient on its own (except temporality, which is non-negotiable). Instead, scientists weigh the evidence across all of them. The smoking and lung cancer link, one of the most famous causal claims in public health, was established this way: the association was strong, consistent across dozens of studies, showed a clear dose-response pattern, and was supported by biological mechanisms explaining how tobacco smoke damages lung tissue.

The Counterfactual Test

Modern causal thinking often relies on a mental exercise called the counterfactual framework. The idea is straightforward: to know whether A caused B in a specific person, you’d need to compare what actually happened with what would have happened if everything were identical except that A never occurred. The causal effect for an individual is the difference between these two scenarios.

Of course, you can never observe both scenarios for the same person at the same time. You can’t simultaneously give someone a drug and not give them the drug. This is sometimes called the fundamental problem of causal inference. Researchers get around it by comparing groups of people and using statistical tools to estimate what the counterfactual outcome would have been.

How Researchers Map Causal Relationships

One increasingly common tool in causal reasoning is the Directed Acyclic Graph, or DAG. These are diagrams where each variable is represented as a dot (called a node), and arrows show the assumed direction of causal influence. A DAG lets researchers visually lay out all their assumptions about what causes what, identify potential confounders, and determine which variables need to be accounted for in their analysis.

DAGs don’t prove causation by themselves. They’re a planning tool. By mapping the assumed relationships before running any analysis, researchers can spot sources of bias and choose the right statistical approach to address them. If a DAG reveals that a confounding variable connects both the exposure and the outcome, researchers know they need to adjust for it. If the DAG shows a variable is a downstream effect of the exposure rather than a confounder, they know to leave it alone. Getting this wrong can introduce bias rather than remove it, which is why the visual mapping step matters so much.

Why It Matters in Everyday Thinking

Understanding causal relationships isn’t just an academic exercise. Every time you read a headline claiming a food “causes” weight loss or a habit “leads to” disease, you’re encountering a causal claim. Knowing the difference between correlation and causation helps you evaluate whether the evidence actually supports the claim or whether a confounding variable, a post hoc fallacy, or simple coincidence might explain the pattern instead.

The core question is always the same: if you removed the supposed cause and changed nothing else, would the outcome still happen? If the answer is yes, you’re looking at correlation. If the answer is no, and you’ve ruled out other explanations, you’re closer to genuine causation.