Statistical error is the difference between a measured or estimated value and the true value it’s trying to represent. It doesn’t mean someone made a mistake. In statistics, “error” refers to the natural gap that opens up whenever you use incomplete data to draw conclusions about something larger, like estimating the average income of a country by surveying a thousand people. Some amount of error is unavoidable in virtually every study, poll, or experiment. Understanding the types of error and how they work helps you judge whether a finding is trustworthy or built on shaky ground.
Why “Error” Doesn’t Mean “Mistake”
The word trips people up because in everyday language, an error is a blunder. In statistics, it’s something different: the built-in uncertainty that comes from working with samples, instruments, and probability rather than perfect, complete information. A political poll that estimates a candidate’s support at 52% when the true figure is 50.5% contains statistical error, but nobody necessarily did anything wrong. The error arises from the fact that pollsters asked 1,200 people instead of every voter in the country.
That said, genuine human mistakes do happen in data collection. An interviewer records the wrong answer, a sensor drifts out of calibration, respondents misunderstand a survey question. These are real blunders, but statisticians treat them as a separate category (non-sampling errors) precisely because they can, in theory, be prevented. Statistical error in the strict sense cannot be eliminated. It can only be measured and minimized.
Sampling Error vs. Non-Sampling Error
The broadest way to split statistical error is into two buckets: sampling error and non-sampling error.
Sampling error exists purely because you’re looking at a sample instead of the entire population. If you could survey every single person or measure every single item, sampling error would be zero. It reflects the luck of the draw: which individuals happened to land in your sample and how well they mirror the whole group. Sampling error tends to grow when the sample is too small, when the proportions of different groups in the sample don’t match the population, or when the selection method isn’t truly random.
Non-sampling error is everything else that distorts your data. It can creep in at any stage, from design to data entry, and it affects surveys and full censuses alike. Common forms include:
- Coverage error: leaving out people who should have been included, or accidentally counting someone twice.
- Non-response error: certain types of people refuse to participate, skewing the results.
- Response error: people give inaccurate answers, sometimes because the question is confusing, sometimes because they want to look good.
- Processing error: data gets garbled during entry, coding, or analysis.
The tricky part about non-sampling error is that it’s hard to detect and even harder to quantify. Sampling error, by contrast, follows well-developed mathematical theory and can be estimated precisely.
How Standard Error Measures Precision
When researchers report a result, they typically attach a standard error to it. The standard error tells you how much the sample estimate would bounce around if you repeated the study many times with fresh samples. A small standard error means the estimate is precise; a large one means it’s wobbly.
The formula is straightforward: standard error equals the standard deviation divided by the square root of the sample size. This relationship has an important practical consequence. To cut your standard error in half, you need to quadruple your sample size, not merely double it. That’s why going from 100 participants to 400 makes a noticeable improvement in precision, but going from 10,000 to 40,000 often isn’t worth the cost.
The standard error also feeds directly into the margin of error you see in news polls. That familiar “plus or minus 3 points” is calculated by multiplying the standard error by a critical value (typically around 1.96 for 95% confidence). So when a poll says a candidate leads 48% to 45% with a 3-point margin of error, the race could plausibly be tied or the lead could be as wide as 6 points.
Type I and Type II Errors
In hypothesis testing, statistical error takes on a more specific meaning. Researchers set up a test with a default assumption (the null hypothesis, usually “there’s no effect”) and then check whether the data provide enough evidence to reject it. Two things can go wrong.
A Type I error, or false positive, happens when you reject the null hypothesis even though it’s actually true. You conclude a drug works when it doesn’t, or that two groups differ when the difference was just noise. The courtroom analogy: convicting an innocent person. Researchers control this risk by setting a significance level before the study begins. The conventional threshold is 0.05, meaning they accept up to a 5% chance of a false positive.
A Type II error, or false negative, is the opposite. You fail to detect a real effect. The drug actually works, but your study missed it. The courtroom version: letting a guilty person go free. The probability of a Type II error is called beta, and the flip side, the probability of correctly detecting a real effect, is called statistical power (1 minus beta). Most clinical trials aim for at least 80% power, which means accepting up to a 20% chance of missing a genuine effect.
These two errors pull against each other. Making it harder to commit a Type I error (lowering your significance threshold) automatically makes Type II errors more likely unless you also increase your sample size. Designing a study always involves balancing these risks.
The 0.05 Threshold and Its Critics
The idea that a result is “statistically significant” if its p-value falls below 0.05 dates back more than a century. The original intent was modest: a p-value below 0.05 simply flagged a result as worth a closer look. It was never meant to be a stamp of scientific truth, but over decades it became exactly that in practice.
In 2016, the American Statistical Association took the unusual step of releasing a formal statement on p-values for the first time in its 177-year history. Among its six principles: scientific conclusions should not be based solely on whether a p-value passes a specific threshold, and the 0.05 cutoff is “conventional and arbitrary.” The statement urged researchers to look at effect sizes, confidence intervals, and the full context of a study rather than treating significance as a binary pass/fail verdict. Some researchers have since proposed lowering the threshold to 0.005 to reduce false positives, while others argue for abandoning fixed thresholds altogether.
How to Reduce Statistical Error
You can’t eliminate statistical error, but several strategies shrink it to a manageable size.
The most direct lever is sample size. Larger samples produce smaller standard errors and tighter confidence intervals. This is why national health surveys recruit tens of thousands of participants while a pilot study might get by with a few dozen.
Stratified sampling divides the population into subgroups (by age, region, income, or whatever matters for the question at hand) and then samples from each subgroup in proportion. This ensures the sample mirrors the population’s composition instead of leaving it to chance. Systematic sampling, where you select every nth person from a list, can also improve consistency.
Randomization is the single most important defense against bias. When every member of the population has an equal chance of being selected, the sample is far less likely to be skewed in a particular direction. Weighted adjustments after data collection can also correct for known imbalances, such as when younger people are underrepresented because they were harder to reach.
For non-sampling errors, the strategies are more procedural: training interviewers to be neutral, pretesting survey questions for clarity, building quality checks into data entry, and following up with non-respondents. These steps don’t follow neat mathematical formulas, but they can prevent the kinds of systematic distortions that no amount of sample size can fix.
Bias vs. Variance in Predictions
When statistical error shows up in predictive modeling, it splits into two components: bias and variance. Bias is the error that comes from oversimplifying. If you try to predict housing prices using only square footage, your model will consistently miss factors like location and condition, producing estimates that are systematically off in one direction. Variance is the error that comes from overcomplicating. A model that’s too flexible will fit the quirks of one particular dataset beautifully but give wildly different predictions on new data.
Total prediction error is roughly the sum of bias and variance (plus irreducible noise from the real world). Reducing one tends to increase the other. A simple model has high bias and low variance; a complex model has low bias and high variance. Finding the sweet spot between them is one of the central challenges in machine learning and applied statistics alike.

