Occam’s razor is a problem-solving principle stating that when two explanations account for the same evidence equally well, the one requiring fewer assumptions is more likely correct. Often paraphrased as “the simplest explanation is usually the best,” the razor is named after William of Ockham, a 14th-century English friar and philosopher who wrote, “Plurality must not be posited without necessity.” It’s not a law of nature or a proof of anything. It’s a guideline for choosing between competing ideas when the evidence alone can’t settle the matter.
What the Principle Actually Says
The most common misunderstanding of Occam’s razor is that it always favors the simplest explanation, full stop. That’s not quite right. The razor only applies when two or more hypotheses make the same predictions and are equally supported by evidence. In that specific situation, you should prefer the one that requires the fewest assumptions. It is not a tool for choosing between hypotheses that make different predictions, because in that case you can just test which prediction turns out to be correct.
Think of it this way: if your car won’t start, one explanation is that the battery is dead. Another is that the battery is dead, your alternator is failing, and a squirrel chewed through a wire, all at once. If both explanations equally account for what you’re observing (the car won’t start, no dashboard lights), the single dead battery is the better starting assumption. You’re not saying the more complex explanation is impossible. You’re saying there’s no reason to add extra assumptions when a simpler one covers the facts just as well.
Why Simpler Theories Tell Us More
The philosopher Karl Popper offered one of the most compelling arguments for why simplicity matters in science. In his view, simple theories are more valuable because they’re easier to disprove. A simple theory makes sharper, more specific predictions. If those predictions turn out wrong, you know the theory is wrong. A complex theory with many adjustable parts can wiggle its way into fitting almost any result, which sounds like a strength but is actually a weakness: it becomes nearly impossible to tell whether the theory is genuinely correct or just flexible enough to absorb any data you throw at it.
Popper put it directly: simple statements “are to be prized more highly than less simple ones because they tell us more; because their empirical content is greater; and because they are better testable.” A theory that can be disproved by a greater range of data is, paradoxically, more trustworthy when it survives that testing.
How Bayesian Statistics Formalizes the Razor
For centuries, Occam’s razor was a philosophical guideline. Modern statistics turned it into something you can calculate. In Bayesian statistics, when you compare two models that both fit your data, the framework automatically penalizes the more complex one. The reason is intuitive once you see it: a flexible model with many adjustable parts can fit a huge range of possible outcomes. Most of those possible outcomes have nothing to do with what actually happened. So when you average across all the things a complex model could predict, its overall probability of producing the specific data you observed is diluted.
A simpler model, by contrast, makes fewer predictions but concentrates its probability on a narrower range. If the data fall within that range, the simpler model gets more credit precisely because it went out on a limb with a more specific prediction. This built-in penalty for complexity isn’t something statisticians bolt on as a subjective preference. It emerges naturally from the math whenever you compare how well different explanations account for the same observations.
Complexity itself turns out to be more nuanced than just counting how many moving parts a model has. Statisticians have identified several distinct ways a model can be complex: the number of adjustable settings it has, the range of patterns it can produce, and even the shape of its predictions near the best fit. All of these contribute to the penalty, and all of them represent ways a model might be overfitting to noise rather than capturing something real.
The Razor in Machine Learning
Overfitting is one of the central problems in artificial intelligence. When a model is too complex relative to the data it’s trained on, it memorizes the noise and quirks of that particular dataset instead of learning the underlying pattern. The result is a model that performs beautifully on its training data and terribly on anything new. The number of complex functions grows exponentially as complexity increases, which means an unbiased learner is overwhelmed by options and easily latches onto the wrong one.
Research published in Nature Communications in 2024 found that deep neural networks, the architecture behind most modern AI, have what the authors described as “an inbuilt Occam’s razor.” These networks naturally gravitate toward simpler functions even when they have far more capacity than they need. This bias toward simplicity counteracts the exponential explosion of complex possibilities and helps explain why these networks generalize well to new data despite being theoretically capable of memorizing anything. When this built-in preference for simplicity is even slightly weakened, the network’s performance degrades sharply, suffering from the kind of variance problems that come with chasing noise.
The Razor in Medicine
Doctors learn a version of Occam’s razor early in training: when a patient has a complex set of symptoms, try to explain them with a single diagnosis before invoking two or more unrelated conditions. The reasoning is sound. If one disease process can account for everything you’re seeing, adding a second unrelated diagnosis means adding assumptions that may not be warranted.
But medicine also has a famous counterpoint called Hickam’s dictum, attributed to the physician John Bamber Hickam: “A patient can have as many diseases as he damn well pleases.” This isn’t a rejection of parsimony so much as a reminder that real patients, especially older ones with multiple health conditions, don’t always fit tidy single-cause explanations. Hickam’s dictum serves as a guard against premature diagnostic closure, the mistake of settling on one diagnosis too quickly and missing a second condition that also needs treatment.
In practice, experienced clinicians use both principles in tension. The goal is to attempt a single explanation first, because that discipline forces you to look for connections between symptoms you might otherwise treat as unrelated. But when a single explanation requires too many unlikely assumptions or statistical stretches, it’s time to accept that more than one thing may be going on.
Where the Razor Can Mislead
Occam’s razor is a heuristic, not a guarantee. The simplest explanation is not always the correct one. Reality is sometimes genuinely complex, and a preference for parsimony can lead you to dismiss a more complicated but accurate explanation in favor of a tidy but wrong one. The razor works best as a tiebreaker when evidence is equal, not as a substitute for gathering more evidence in the first place.
The popular version of the principle, “the simplest explanation is usually correct,” strips away the important qualifier about equal explanatory power. Two explanations are only comparable under the razor if they account for the same data equally well. If a more complex theory explains observations that the simpler one cannot, the razor doesn’t apply. You should prefer the theory that actually fits the evidence, even if it’s more complicated. Simplicity is a preference, not a trump card.
There’s also the question of what “simple” means. Fewer assumptions? Fewer variables? Easier to understand? These don’t always point to the same theory. Einstein’s general relativity is conceptually more parsimonious than Newton’s gravity (one framework instead of separate rules for different situations), but the math is far more complex. Whether that counts as “simpler” depends entirely on what kind of simplicity you’re measuring, which is part of why the razor remains a guideline rather than a rigid rule.

