A low p-value means your data would be unlikely to occur if nothing interesting were actually going on. More specifically, it tells you the probability of seeing results at least as extreme as what you observed, assuming there’s no real effect. A p-value of 0.03, for example, means there’s only a 3% chance you’d get data this striking if the effect you’re testing for didn’t actually exist.
That sounds straightforward, but the p-value is one of the most misunderstood numbers in science. Understanding what it actually tells you, and what it doesn’t, can change how you read research headlines, medical studies, and even election polls.
What a P-Value Actually Measures
Every statistical test starts with what’s called a null hypothesis: the assumption that there’s no real difference or effect. A drug doesn’t work better than a placebo. A new teaching method doesn’t improve test scores. The null hypothesis is the boring explanation.
The p-value asks: if the boring explanation were true, how surprising would my data be? A low p-value means very surprising. A high p-value means not surprising at all. When the data looks extremely unlikely under the boring explanation, researchers take that as evidence that something real is happening.
Crucially, the p-value assumes the null hypothesis is true from the start. It cannot tell you the probability that your hypothesis is correct. This distinction trips up even experienced researchers. A p-value of 0.02 does not mean there’s a 98% chance the treatment works. It means that if the treatment had zero effect, you’d only see data this extreme about 2% of the time.
Common Significance Thresholds
The most widely used cutoff is 0.05, meaning there’s a 5% chance of getting results this extreme by random variation alone. This corresponds to 95% confidence and is the standard in most biomedical and social science research. Pharmaceutical research often uses a stricter threshold of 0.01, corresponding to 99% confidence. When a p-value falls below 0.001, researchers typically just report “p < 0.001” rather than stating the exact number.
Different fields set very different bars. Particle physics uses what’s known as a “five sigma” threshold, which translates to a p-value of about 0.0000003. This is the standard CERN applied when confirming the discovery of the Higgs boson. The reasoning is practical: physicists run enormous numbers of tests and need to be nearly certain a signal isn’t a fluke. In population studies like election polling, three sigma (roughly p = 0.003) is usually considered sufficient.
These thresholds are conventions, not laws of nature. A p-value of 0.049 and a p-value of 0.051 represent nearly identical evidence, even though one crosses the line and the other doesn’t.
Why a Low P-Value Can Be Misleading
Here’s something that surprises most people: a p-value is heavily influenced by sample size. The larger your dataset, the smaller your p-value tends to be, even when the actual difference you’re measuring is trivially small. A study with 100,000 participants might find a statistically significant difference between two groups that amounts to, say, a 0.2-pound weight difference. The p-value could be well below 0.001, but the real-world meaning of that difference is essentially zero.
This happens because larger samples estimate averages more precisely. Greater precision makes it easier to detect any deviation from the null hypothesis, no matter how tiny. Researchers working with large datasets routinely find statistically significant results for differences that have no practical importance whatsoever.
A low p-value also doesn’t tell you how big the effect is. It only tells you that some effect probably exists. As one influential statistician put it, “statistical significance is the least interesting thing about the results.” The more important question is always: how much does it matter?
Statistical Significance vs. Real-World Importance
This is the distinction between statistical significance and what researchers call effect size. A p-value tells you whether an effect exists. Effect size tells you whether it’s worth caring about. Both pieces of information are essential, and a p-value alone gives you only half the picture.
Consider a study on a new blood pressure medication. The study might produce a p-value of 0.001, meaning the drug almost certainly has some effect. But if it only lowers blood pressure by 1 mmHg on average, that’s not clinically meaningful. Meanwhile, a smaller study of a different drug might show a 15 mmHg reduction with a p-value of 0.04. The second result is far more useful, even though its p-value is less impressive.
Unlike p-values, effect size doesn’t depend on sample size. It’s a direct measure of how large the difference is between groups. When you’re reading about a study’s results, the effect size is usually the number that actually answers the question you care about.
The Most Common Misinterpretations
The American Statistical Association issued a formal statement in 2016 specifically because p-values were being so widely misused. Their six core principles are worth knowing:
- A p-value does not measure the probability that your hypothesis is true. A p-value of 0.03 does not mean there’s a 97% chance the treatment works. It measures data compatibility with the null hypothesis, not the truth of any particular claim.
- A p-value does not tell you the probability that your results are due to chance. This is the single most common misreading. The calculation already assumes chance is the only explanation, then asks how likely the data would be under that assumption.
- Crossing a threshold doesn’t make a result important. Scientific conclusions and policy decisions should never rest on whether a p-value passes a specific cutoff. Context, effect size, and study design all matter.
- A p-value says nothing about effect size. A tiny, meaningless difference can produce a very low p-value with enough data.
- A p-value alone is weak evidence. By itself, it doesn’t provide a good measure of evidence for or against a hypothesis. It needs to be interpreted alongside other information.
How to Read P-Values in the Real World
When you encounter a low p-value in a news headline or study summary, ask three questions. First, how big was the actual effect? A “statistically significant” finding could describe anything from a life-saving drug to a meaningless blip. Second, how large was the study? Very large studies can make trivial differences look significant. Third, was this result predicted in advance, or discovered after sifting through many possible comparisons? Testing dozens of variables and then highlighting the one that crossed p = 0.05 is a well-known way to produce false positives.
A p-value of 0.02 corresponds to a 2% probability of a false alarm, assuming everything about the study was done properly. That sounds reassuringly low, but across thousands of studies published each year, a 2% or 5% false positive rate adds up. This is one reason replication, repeating a study to see if the results hold, matters so much in science.
The p-value is a useful tool when understood correctly. It quantifies how incompatible your data is with the assumption that nothing is happening. But it’s a narrow measurement. It doesn’t tell you how likely a finding is to be true, how large the effect is, or whether anyone should change their behavior because of it. Those answers require looking beyond the number.

