Is a Lower P-Value Better? Not Always

A lower p-value generally indicates stronger evidence against the assumption that there’s no real difference or effect in a study. But “lower is better” oversimplifies things. A very low p-value can be misleading if the actual size of the effect is tiny, and a p-value just above the common cutoff can still point to something meaningful. Understanding what p-values actually measure, and what they don’t, is the key to reading them correctly.

What a P-Value Actually Tells You

A p-value measures how compatible your data are with the assumption that nothing interesting is happening. In statistics, that assumption is called the null hypothesis. If you’re testing whether a new drug lowers blood pressure more than a placebo, the null hypothesis says there’s no real difference between the two. The p-value then tells you: if there truly were no difference, how likely would you be to see results at least as extreme as what the study found?

A p-value of 0.03, for example, means there’s a 3% chance of seeing data this extreme if the drug and placebo were truly identical. That’s a fairly small probability, so researchers would typically conclude the drug probably does have a real effect. A p-value of 0.40 means there’s a 40% chance, which gives you little reason to rule out coincidence.

So yes, in that narrow sense, lower is better. A smaller p-value means the data are harder to explain away as random noise.

Why 0.05 Is the Standard Cutoff

Most fields use 0.05 as the dividing line for “statistical significance.” The convention traces back to the statistician Ronald Fisher, who argued that a 1-in-20 chance of being wrong was a reasonable threshold for calling a result significant. If your p-value falls below 0.05, you reject the null hypothesis. If it’s above 0.05, you don’t.

That said, 0.05 is arbitrary. It’s a convention, not a law of nature. Evidence sits on a continuum: a p-value of 0.049 is not meaningfully different from 0.051, even though one crosses the line and the other doesn’t. Some researchers argue that results with p-values below 0.10 are “trending toward significance” and may still be clinically relevant, especially in smaller studies. In certain types of analyses, it’s common to relax the cutoff to 0.10 or even 0.15. Other researchers have pushed to tighten the standard to 0.005 to reduce false positives.

Where “Lower Is Better” Breaks Down

The biggest problem with chasing low p-values is that a tiny p-value can mask a tiny, meaningless effect. P-values depend on two things: the size of the effect and the size of the sample. With a large enough sample, you can get a statistically significant p-value for a difference so small it has no practical importance whatsoever.

A well-known example involves a large study on aspirin and heart disease. The study found a statistically significant benefit, but the actual risk difference was only 0.77%. That’s an extremely small effect. Based on that result, many people were advised to take aspirin who were unlikely to benefit, while still being exposed to side effects like stomach bleeding. The p-value looked impressive; the real-world impact was negligible.

Consider another example: two cancer drugs both produce a p-value of 0.01. Drug A extends survival by five years. Drug B extends it by five months at a much higher cost. The statistics look identical, but the clinical reality is completely different. The p-value alone can’t tell you which result actually matters.

What P-Values Don’t Measure

The American Statistical Association released a formal statement identifying six principles about p-values, and several of them directly address common misunderstandings. The most important ones to remember:

A p-value is not the probability that the hypothesis is true. A p-value of 0.05 does not mean there’s a 5% chance the null hypothesis is correct, or a 95% chance your treatment works. It only describes how surprising the data would be under one specific assumption.
A p-value does not measure the size or importance of an effect. It tells you whether an effect likely exists, not whether it’s large enough to care about.
Decisions should not rest on a p-value alone. A single number crossing a single threshold is not enough to draw a scientific conclusion, set a policy, or change a medical recommendation.

These misinterpretations are remarkably common, even among researchers. Treating a low p-value as proof that something important has been found is one of the most persistent errors in how science gets reported and understood.

Effect Size: The Missing Piece

If a p-value tells you whether an effect likely exists, effect size tells you how big it is. And in practical terms, how big an effect is matters far more than whether it clears a statistical threshold.

Effect size is independent of sample size. A study of 100 people and a study of 100,000 people can find the same effect size, but the larger study will almost always produce a smaller p-value, simply because more data makes the statistical test more sensitive. This is why p-values are sometimes described as “confounded” by sample size. A statistically significant result sometimes means only that a huge sample was used.

When reading a study, the most informative combination is a low p-value paired with a meaningful effect size. A low p-value with a trivial effect size tells you the study was probably just very large. A moderate p-value with a large effect size in a small study may point to something genuinely important that needs further investigation.

How Low P-Values Can Be Manufactured

The pressure to publish results with low p-values has created a well-documented problem called p-hacking. This happens when researchers try multiple statistical approaches or tweak their data until a nonsignificant result becomes significant. Common techniques include running analyses partway through an experiment to decide whether to keep collecting data, measuring many outcomes and only reporting the ones that reach significance, removing outliers after seeing the results, and combining or splitting groups until something works.

None of these practices are necessarily fraudulent on their own, but when they’re done specifically to push a p-value below 0.05, they inflate the apparent strength of the evidence. A p-value of 0.03 that was the product of testing dozens of analyses is far less meaningful than one that came from a single pre-planned test. This is one reason why a lower p-value isn’t automatically more trustworthy: the methods behind it matter just as much as the number itself.

How to Read P-Values in Practice

When you encounter a p-value in a study or news article, a few questions will help you interpret it more accurately than simply checking whether it’s low enough. First, how large was the study? Very large studies can produce impressively small p-values for differences too small to matter in real life. Second, what was the actual effect? A drug that lowers blood pressure by 1 mmHg with a p-value of 0.001 is less useful than one that lowers it by 15 mmHg with a p-value of 0.04.

Third, was the analysis planned in advance, or does the study show signs of fishing for results? Pre-registered studies, where researchers publicly commit to their methods before collecting data, are more reliable on this front. And finally, has the result been replicated? A single study with a low p-value is a starting point, not a conclusion. The strength of evidence comes from consistent findings across multiple studies, not from one number in one paper.

A lower p-value is one useful signal among several. It’s stronger evidence against the null hypothesis, all else being equal. But all else is rarely equal, and the p-value by itself tells you nothing about whether a finding is large, important, or worth acting on.