What Is Uncertainty in Statistics and How Is It Measured?

Uncertainty in statistics is a measure of how much a result might differ from the true value you’re trying to estimate. Every time you draw conclusions from limited data, whether that’s a medical study, a political poll, or a physics experiment, some degree of doubt comes along for the ride. Statistical tools exist specifically to quantify that doubt, turning a vague “we’re not totally sure” into a precise range or number you can reason with.

Why Uncertainty Exists

Uncertainty shows up for two fundamental reasons. The first is pure randomness. The natural world varies: people’s blood pressure fluctuates hour to hour, manufactured parts differ by tiny fractions of a millimeter, and survey respondents are drawn from a population that can’t be interviewed in full. This kind of uncertainty is sometimes called aleatory uncertainty, and it can’t be eliminated no matter how careful you are. You can measure it, model it, and shrink it with better study designs, but it never goes to zero.

The second source is incomplete knowledge. You might be using a simplified model, missing a variable, or working with a measurement tool that isn’t perfectly calibrated. This is epistemic uncertainty, and unlike randomness, it can theoretically be reduced. Gather more data, use a better instrument, or refine your model, and this type of uncertainty shrinks. The catch is that you often don’t know exactly how much epistemic uncertainty is hiding in your results, which makes it harder to quantify than randomness.

Statistical vs. Systematic Uncertainty

A related distinction matters in practice. Statistical uncertainty comes from the inherent randomness of measuring any continuous variable. If you time your commute ten mornings in a row, you’ll get ten slightly different numbers. That spread is statistical uncertainty, and it bounces randomly in both directions, sometimes too high, sometimes too low.

Systematic uncertainty is a consistent bias in one direction. If your watch runs two minutes slow, every time measurement will be too small by roughly the same amount. No amount of repeated measurement fixes this because the error isn’t random. Statistical uncertainty relates to precision (how tightly your measurements cluster together), while systematic uncertainty relates to accuracy (how close the cluster is to the real value). Good science needs to address both.

How Uncertainty Gets Measured

The most common way to express uncertainty is through a few related tools: standard deviation, standard error, and confidence intervals. Each one captures something different, and mixing them up is one of the most frequent mistakes in interpreting data.

Standard deviation (SD) describes how spread out individual data points are. Roughly two-thirds of your data will fall within one SD of the average, and about 95% will fall within two SDs. It tells you about the variability in the thing you’re measuring.

Standard error (SE) tells you something different: how precisely you’ve estimated the average itself. It’s calculated by dividing the standard deviation by the square root of your sample size. This means that increasing your sample size makes the standard error smaller, but the relationship isn’t linear. To cut the standard error in half, you need four times as many observations, not twice as many. That square-root relationship is why polling firms don’t simply interview ten times more people to get ten times more precision.

Confidence intervals wrap the standard error into a range. A 95% confidence interval means that if you repeated the entire study many times, 95% of those intervals would contain the true value. A wide interval signals high uncertainty; a narrow one signals high precision. When a sample has 10 or more observations, the 95% confidence interval is roughly the average plus or minus two standard errors.

Margin of Error in Polls

Political polling gives one of the most visible everyday examples of statistical uncertainty. When a poll reports a margin of error, it’s referring to the margin of sampling error: how much the result from interviewing a sample might differ from what you’d find if you could ask every eligible voter.

A typical national poll surveys about 1,000 to 2,000 people. With roughly 132 million voters in a national election, a 1,000-person poll carries a margin of sampling error of about 3.1 percentage points. Double the sample to 2,000, and the margin drops to about 2.2 points. That margin is set at the 95% confidence level, which means one time out of twenty, the true result will fall outside that range even if everything else about the poll is done perfectly. And sampling error is only one source of uncertainty in polls. Question wording, response rates, and models for predicting who will actually vote introduce additional uncertainty that the margin of error doesn’t capture.

Confidence Intervals vs. Prediction Intervals

When you’re working with a model that predicts one variable from another (say, predicting someone’s blood pressure from their age), two types of intervals come up. A confidence interval tells you the uncertainty in the average prediction: “For all 50-year-olds as a group, we’re fairly sure the average blood pressure falls in this range.” A prediction interval tells you the uncertainty for a single new individual: “For this particular 50-year-old, their blood pressure could fall in this wider range.”

Prediction intervals are always wider than confidence intervals because they account for two things at once: the imprecision of your estimate of the average, plus the natural person-to-person variation around that average. The individual variation is usually the larger of the two components, which is why predicting a single case is always harder than estimating a group average.

How Uncertainties Combine

In many real-world calculations, a final result depends on several measured quantities, each carrying its own uncertainty. If you’re calculating the area of a room, for instance, you multiply length by width, and both measurements have some imprecision. The uncertainty in the final answer reflects the combined uncertainties of all the inputs.

For quantities that are multiplied together, the percentage uncertainties combine in a specific way: you square each one, add the squares, and take the square root of the total. This means the largest single source of uncertainty tends to dominate the result. If your length measurement has 5% uncertainty and your width has 1%, the combined uncertainty is close to 5%, not 6%. This principle, called propagation of uncertainty, is why scientists focus their effort on reducing whichever input carries the most uncertainty rather than trying to improve every measurement equally.

Statistical Significance and Its Limits

One of the most widely used (and widely misunderstood) ways to express uncertainty is the p-value. A p-value below 0.05 has long been treated as the dividing line between a “significant” and “non-significant” result. But that threshold is a convention, not a law of nature, and it has serious limitations.

A p-value tells you the probability of seeing a result at least as extreme as yours if there were truly no effect. It does not tell you the probability that your hypothesis is correct, and it does not tell you whether the effect is large enough to matter in practice. A drug study might find a statistically significant increase in survival, but if that increase is five months compared to five years from another treatment, the statistical significance is misleading. Sample size and measurement variability heavily influence whether a result crosses the 0.05 threshold, so a non-significant result doesn’t necessarily mean a treatment is useless, and a significant result doesn’t necessarily mean it’s meaningful.

This gap between statistical significance and practical importance has led to growing calls for more flexible approaches. Many researchers now argue that the significance threshold should be adapted to the context of a study, its design, sample size, prior evidence, and the consequences of being wrong, rather than applied as a rigid universal cutoff. A more nuanced, context-dependent approach to interpreting uncertainty tends to produce more reliable and reproducible science.

Reading Error Bars on Charts

When you encounter a bar chart or line graph in a news article or research paper, the thin lines extending above and below each data point are error bars. They’re one of the most common visual representations of uncertainty, but they can represent at least four different things: the range of the data, the standard deviation, the standard error, or a confidence interval. Each one has a different width, and each one answers a different question.

Range bars simply show the gap between the highest and lowest data points. Standard deviation bars show how spread out the individual observations are. Standard error bars show the precision of the estimated average, and confidence interval bars show a plausible range for the true average. If a graph doesn’t label which type of error bar it’s using, you can’t interpret it properly. Standard error bars will always be smaller than standard deviation bars for the same data (since SE equals SD divided by the square root of the sample size), so a graph using SE bars can make results look more precise than they really are. Look for a caption or legend that specifies the type before drawing conclusions.