What Is Bias in Statistics: Definition and Types

Bias in statistics is a systematic error that pushes results away from the truth in a consistent direction. Unlike random error, which scatters measurements unpredictably and tends to cancel out with larger samples, bias pulls every estimate too high or too low. It can enter a study through how participants are chosen, how data is collected, how questions are asked, or how results are analyzed and reported. Understanding where bias comes from is the first step to recognizing when a statistic deserves your trust and when it doesn’t.

How Bias Differs From Random Error

Imagine a bathroom scale that reads two pounds heavy every time you step on it. Your readings will cluster tightly around the wrong number. That’s bias: a consistent, directional mistake. Now imagine a scale that fluctuates randomly, sometimes reading a pound high and sometimes a pound low. That’s random error, or variance. With enough measurements, random error averages out. Bias does not. You could weigh yourself a thousand times on that broken scale and your average would still be two pounds off.

Formally, bias is the gap between the value a method is expected to produce and the true value it’s trying to estimate. If a statistical method’s expected output equals the true value, it’s called “unbiased.” If not, the size of that gap is the bias. This matters because total error in any estimate comes from two sources: bias squared plus variance. Reducing one often increases the other, a tension statisticians call the bias-variance tradeoff. A very flexible model might have low bias but wild variance. A rigid model might be stable but consistently off-target. Good statistical practice means finding the balance point where total error is smallest.

Selection Bias

Selection bias happens when the people or data points in a study don’t represent the population you actually care about. It’s one of the most common and most damaging forms of bias because it distorts results before a single measurement is taken.

Sampling bias is a specific type of selection bias that occurs when participants are chosen in a non-random way. If a health survey recruits only through email, it excludes people without internet access, who may differ in age, income, and health status from those who respond. The results look clean but describe the wrong population. A study comparing diabetes screening rates at two clinics, for instance, could produce misleading differences if the clinics serve communities with different income levels, and the researchers don’t account for that.

A subtler form, sometimes called Berkson bias, shows up when researchers draw study participants from hospitals. Hospitalized patients are sicker than the general population, so comparing them to non-hospitalized people introduces a systematic skew. In one well-known example, a hospital-based study found no significant link between smoking and bladder cancer, a relationship that’s firmly established in the broader population. The problem: both the cancer patients and the comparison group had extensive smoking histories, because heavy smokers are more likely to be hospitalized for various reasons. The bias erased a real effect.

Information and Measurement Bias

Even with a perfectly representative sample, bias can creep in through how data is collected. This broad category includes any systematic distortion in measurement or reporting.

Recall bias is common in studies that ask people to remember past exposures. In a study comparing people with a disease to healthy controls, those who are sick tend to search their memory harder for possible causes. They’re more likely to remember and report risk factors, while healthy participants forget or dismiss them. The result is an inflated apparent link between the exposure and the disease.

Observer bias (also called confirmation bias in research settings) occurs when the person collecting or interpreting data is influenced by what they expect to find. A researcher who believes a treatment works might unconsciously record ambiguous outcomes more favorably in the treatment group.

Instrument bias is more mechanical. A blood pressure cuff that consistently reads five points high, a thermometer that hasn’t been calibrated, or a survey question with leading language will all shift results in one direction. Unlike random measurement noise, these errors don’t cancel out with more data. They compound.

Confounding

A confounding variable is something that influences both the thing you’re studying and the outcome you’re measuring, creating the illusion of a relationship that may not exist, or hiding one that does. The classic example: ice cream sales and drowning deaths both rise in summer. Ice cream doesn’t cause drowning. Hot weather drives both.

Confounding is one of the main reasons that correlation doesn’t equal causation. If a study finds that coffee drinkers have higher rates of heart disease, but coffee drinkers are also more likely to smoke, smoking is a confounder. Without accounting for it, you’d wrongly blame the coffee.

Researchers handle confounders in two stages. During study design, they use randomization (randomly assigning people to groups so confounders are evenly distributed), restriction (limiting the study to a narrow group where the confounder doesn’t vary), or matching (pairing participants so each group has similar confounder profiles). During analysis, they use stratification, which means splitting the data into subgroups where the confounder is held constant and examining results within each subgroup. For more complex situations with multiple confounders, statistical modeling techniques can adjust for several variables simultaneously.

Publication Bias

Publication bias is the tendency for studies with dramatic or positive results to get published while studies with null or negative results sit in file drawers. This means the published literature on any topic is a skewed sample of all the research that’s been done. When someone reviews “all the evidence,” they’re really reviewing the evidence that made it into print, which systematically overstates effects.

One tool for detecting this is the funnel plot, which graphs the results of many studies against their sample sizes. In an unbiased literature, small studies (which are less precise) should scatter widely around the true effect, while large studies cluster tightly, forming a symmetric funnel shape. When publication bias is present, the funnel looks lopsided because small studies showing weak or negative results are missing from the published record. Statistical tests like Egger’s regression test can quantify this asymmetry. A method called trim and fill goes further, estimating what the overall effect size would look like if the missing studies were added back in.

Survivorship Bias

Survivorship bias is what happens when you draw conclusions from the winners while ignoring the losers, simply because the losers aren’t visible. The most famous illustration comes from World War II. Military commanders wanted to add armor to bomber aircraft but couldn’t cover the entire plane without making it too heavy to fly. They examined returning bombers, noted where the bullet holes were concentrated, and proposed armoring those areas. The statistician Abraham Wald pointed out the flaw: they were only seeing planes that survived. The bullet holes on returning planes marked the spots where a bomber could take damage and still make it home. The areas with no bullet holes were the places where a hit was fatal, because those planes never came back. The armor belonged where the holes weren’t.

This same logic applies everywhere. Studying only successful companies to find the “secrets of success” ignores failed companies that did the same things. Concluding that a medication has few side effects based on patients who stayed on it ignores everyone who quit because of side effects. Whenever your data only includes survivors, the conclusions will be biased toward whatever helped them survive.

Algorithmic Bias

Statistical bias has taken on new urgency with the rise of machine learning, where algorithms make decisions about loan approvals, hiring, criminal sentencing, and medical diagnosis. These systems learn from training data, and when that data reflects historical discrimination, the algorithm reproduces and sometimes amplifies it.

A hiring algorithm trained on a company’s past decisions will learn the patterns embedded in those decisions. If the company historically hired fewer women, the algorithm learns to penalize indicators of female gender. Amazon discovered exactly this when an internal recruiting tool systematically downgraded resumes that contained the word “women’s” or listed attendance at a women’s college. The algorithm wasn’t given an instruction to discriminate. It found the pattern in the data and optimized for it.

The criminal justice risk tool COMPAS illustrates a different pathway. Trained on historical sentencing data, the system was found to be twice as likely to incorrectly classify Black defendants as high risk compared to white defendants. The training data carried the biases of past judicial decisions, and the algorithm faithfully learned them. Even unsupervised models working with raw data can discover and replicate discriminatory patterns embedded in society, because those patterns are real features of the data. The bias isn’t in the math. It’s in what the math was trained on.

Reducing Bias in Practice

No single technique eliminates all bias, but several strategies target specific sources. Random assignment of participants to groups is the gold standard for balancing both known and unknown confounders. Blinding, where participants and researchers don’t know who received which treatment, reduces observer bias and recall bias simultaneously. Standardized measurement protocols and calibrated instruments address measurement bias at the source.

For observational studies where randomization isn’t possible, propensity score methods offer a workaround. Each participant gets a score based on their characteristics that predict which group they’d naturally fall into. Researchers then compare people with similar scores across groups, mimicking some of the balance that randomization would have provided.

The American Statistical Association’s ethical guidelines, updated in 2022, place bias mitigation at the center of responsible practice. They call on statisticians to communicate known biases in their data sources, be transparent about assumptions and possible sources of error, disclose when multiple comparisons are conducted, and share data and methods to enable replication. Notably, the guidelines warn against reporting only results that conform to expectations, recognizing that selective reporting is itself a form of bias that corrupts the scientific record.