What Is the Broken Leg Test? The Classic Thought Experiment

The broken leg test is a thought experiment from psychology that describes when common sense should override a statistical prediction. Coined by psychologist Paul Meehl in 1954, it illustrates a simple idea: sometimes a human observer knows something obvious and important that no formula could account for.

The concept comes up in psychology, medicine, and increasingly in conversations about artificial intelligence and algorithmic decision-making. It’s not a physical test for diagnosing a fracture, despite what the name might suggest.

The Original Thought Experiment

Meehl laid out the idea in his landmark book “Clinical versus Statistical Prediction.” Imagine you’ve built a statistical model that predicts whether a particular person will go to the movies on a given night. The model is good. It factors in the person’s habits, the day of the week, what’s playing, the weather. It correctly predicts their behavior most of the time.

Now imagine you learn that this person broke their leg yesterday. No formula includes “broken leg” as a variable, because it’s rare and unpredictable. But you, the human observer, immediately know it changes everything. The person almost certainly isn’t going to the movies tonight. As Meehl put it, “a broken leg is a rare and improbable event in a patient’s life. Yet it will prevent the patient from carrying out a habitual activity.” To a clinician, that broken leg is highly relevant. In the mechanical computation of a statistical formula, it doesn’t figure at all.

The broken leg test, then, is the question you ask before overriding a data-driven prediction: do I have a genuine “broken leg” piece of information, something rare, concrete, and obviously decisive? Or am I just second-guessing the numbers with a hunch?

Why Meehl Thought It Mattered

Meehl wasn’t arguing that human judgment is generally better than statistical models. In fact, his career-long argument was the opposite. He reviewed study after study comparing clinical predictions (a trained expert using their experience and intuition) against actuarial predictions (a simple formula crunching data). The formulas won almost every time. Meta-analyses conducted since have confirmed that actuarial methods outperform clinical judgment on average across a wide range of fields, from predicting criminal recidivism to diagnosing psychiatric conditions.

His conclusion that statistical prediction is generally superior has held up remarkably well for over 70 years. Later researchers described his conceptual analysis of the prediction problem, especially his defense of applying population-level probabilities to individual cases, as work that hasn’t been significantly improved upon since 1954.

But Meehl was intellectually honest enough to acknowledge an exception. Sometimes a clinician has access to a piece of information so unusual and so clearly relevant that the formula can’t capture it. The broken leg test is his name for that exception. It’s narrow by design. He wanted clinicians to recognize that their instinct to override data is almost always wrong, except in those rare, obvious cases.

How to Tell a “Broken Leg” From a Hunch

The distinction matters because people routinely overestimate the quality of their own judgment. A true broken-leg situation has three characteristics. The information is concrete and verifiable, not a vague feeling. It is clearly and logically connected to the outcome being predicted. And it is genuinely unusual, something the model wasn’t designed to encounter.

If a heart risk calculator says a patient has low cardiovascular risk, but the clinician notices the patient is clutching their chest and sweating, that’s a broken-leg moment. The model doesn’t know what’s happening in the room right now. On the other hand, if a validated screening tool says imaging isn’t needed for an ankle injury but a provider orders an X-ray anyway “just to be safe,” that’s more likely a hunch overriding good data. Studies on the Ottawa Ankle Rules, a well-validated set of criteria for deciding when ankle X-rays are necessary, show that clinicians frequently override the rules even when the rules are right. The rules catch between 96% and 99% of fractures, yet providers consistently order unnecessary imaging because their gut tells them to.

Research on clinical decision support systems has found this pattern repeatedly. Even when algorithms flag that a test is unlikely to be useful, providers often proceed anyway. One study on ankle injury imaging found that despite alerts advising against unnecessary X-rays, radiography use remained suboptimal. Providers could cancel the order or ignore the alert, and many chose to ignore it.

The Concept in the Age of Algorithms

Meehl was writing about psychology and psychiatry in the 1950s, but the broken leg test has become newly relevant as algorithmic and AI-based decision tools spread into hiring, lending, criminal justice, and everyday medicine. Any time a person is asked to follow or override an algorithm’s recommendation, they face a version of Meehl’s question.

The core tension is this: algorithms are better than humans at combining lots of variables into a consistent prediction. They don’t get tired, they don’t have biases that shift from morning to afternoon, and they weigh evidence the same way every time. In some cases, remarkably simple models perform nearly as well as complex ones. One study found that a model using just two decision rules and evaluating only two to four pieces of information predicted outcomes about as accurately as a traditional statistical model incorporating 20 variables.

But algorithms only know what they’ve been trained on. They can miss context that would be immediately obvious to a person in the room. A hiring algorithm doesn’t know the candidate’s car broke down on the way to the interview. A risk-assessment tool doesn’t know a patient just lost their spouse. These are broken-leg situations, and they’re where human oversight genuinely adds value.

The problem is that people invoke the broken leg test far too liberally. They use it to justify overriding data whenever the data’s recommendation feels uncomfortable. Meehl’s whole point was that this should be the exception, not the rule. If you find yourself overriding the algorithm regularly, you probably don’t have a broken leg. You have a bias.

What the Broken Leg Test Is Not

Because of the name, some people land on this term expecting a physical screening test for leg fractures. That’s a different topic entirely. In orthopedic medicine, clinicians use a range of physical examination techniques to assess lower limb injuries: the squeeze test for ligament damage between the shin bones, the anterior drawer sign for ankle ligament tears, and tools like tuning fork vibration tests to help rule out fractures. The Ottawa Ankle Rules provide a validated checklist for deciding whether an ankle or foot injury needs an X-ray. None of these are called “the broken leg test.”

Meehl’s broken leg test lives in the world of decision science, not orthopedics. It’s a mental model for knowing when to trust data and when to trust your eyes, with the strong caveat that you should almost always trust the data.