Sampling bias is a systematic error that occurs when the people (or data points) included in a study don’t accurately represent the larger population the study claims to describe. It happens when a sampling method consistently favors some groups over others, meaning the conclusions drawn from that sample only truly apply to the slice of the population that was actually captured. This is one of the most common and consequential flaws in research, polls, medical trials, and increasingly, in the datasets used to train artificial intelligence.
How Sampling Bias Works
Every study starts with a target population: the full group of people you want to learn about. Because studying everyone is usually impossible, researchers select a sample. If that sample is chosen in a way that systematically excludes certain types of people, the results will be skewed in a predictable direction. The key word is “systematically.” Random errors bounce around and tend to cancel out over large samples. Sampling bias doesn’t cancel out. It pushes results in one direction every time, no matter how large the sample gets.
A simple example: surveying restaurant customers about dining habits by handing out questionnaires only during weekday lunches. That method systematically excludes people who eat out primarily on evenings and weekends. The sample might be large, but it will never represent the full population of diners. The inferences only apply to the subpopulation that was actually sampled.
Common Types of Sampling Bias
Sampling bias shows up in several distinct patterns, each with its own mechanism.
Self-selection (voluntary response) bias occurs when people choose whether to participate. Those with strong opinions or extreme experiences are more likely to respond. A patient satisfaction simulation study illustrates this clearly: the average rating from a biased, self-selected sample was 0.12 points higher than the true average across all patients. That might sound small, but it represented nearly a full standard deviation of difference. The distortion was worst for the lowest-performing physicians, whose biased-sample ratings were inflated almost twice as much as those of top-performing physicians. Voluntary surveys consistently skew toward people who are either very happy or very unhappy, missing the moderate middle.
Undercoverage bias happens when your sampling method physically cannot reach certain groups. Online-only surveys are a prime modern example. Data from the CDC’s 2017 Behavioral Risk Factor Surveillance System shows that nearly 50% of adults aged 75 and older were not internet users. Among people without a high school diploma, 45% were offline. Among those living below the federal poverty level, about 30% had no internet access. Hispanic and Black respondents were also roughly twice as likely as white respondents to be non-internet users. Any web-only survey systematically misses these groups, and the resulting data will undercount their health conditions, behaviors, and opinions.
Survivorship bias is the error of drawing conclusions only from the people or things that made it through a selection process, ignoring those that didn’t. The most famous illustration comes from World War II, when military analysts studied returning bombers to decide where to add armor. The planes that came back had bullet holes concentrated in certain areas. Mathematician Abraham Wald realized the damage pattern on surviving planes actually showed where planes could take hits and still fly. The planes that were hit in the other areas never made it back. The armor needed to go precisely where the returning planes were undamaged.
Convenience sampling bias comes from studying whoever is easiest to reach, like a psychology study that recruits only from undergraduate classes, or a medical study that draws patients from a single hospital. The sample reflects that specific, accessible group rather than the broader population.
Why It Matters in Medical Research
Sampling bias in clinical trials has direct consequences for patient care. Trials are often conducted in highly selected groups of patients whose characteristics may differ substantially from the broader population that will eventually use the treatment. When the mix of people in a trial doesn’t match the real-world population, the treatment effect measured in the study can be meaningfully different from what patients will actually experience.
Concrete examples make this tangible. Researchers analyzing substance use treatment trials found that estimates of how well therapies like medication-assisted treatment and motivational interviewing worked would have been very different, typically less effective and no longer statistically significant, if the trials had enrolled samples more representative of all treatment-eligible people in the United States. In another case, studies of antidepressant effects on suicidal thoughts in young people may have overstated the risk because the trials under-enrolled or explicitly excluded the youth who were at highest risk for those outcomes.
The core issue is that if a trial only enrolls adults under 50, it cannot tell you how a treatment works in older adults without assuming that age doesn’t change the treatment’s effect. That’s a strong assumption, and it’s often wrong. For trial results to apply broadly, studies need to enroll a full spectrum of patients.
Sampling Bias in AI and Algorithms
The same principle scales up dramatically in machine learning. Models trained on unrepresentative datasets can fail when they encounter people who were underrepresented in the training data. A model trained to recommend treatments for chronic disease using a dataset of mostly male patients, for instance, may make incorrect predictions for female patients when deployed in a hospital.
One traditional fix is dataset balancing: removing data points until all subgroups are equally represented. The problem is that this often requires throwing away huge amounts of data, which hurts the model’s overall accuracy. Researchers at MIT developed a more targeted approach. Instead of removing data until everything is equal, their technique identifies the specific data points in a training set that contribute most to a model’s failures on underrepresented groups. By removing far fewer points (in one test, about 20,000 fewer than conventional balancing), they improved accuracy for minority subgroups without sacrificing overall performance. Their method can even detect hidden sources of bias in datasets where the subgroups aren’t labeled.
How Researchers Correct for It
The gold standard for preventing sampling bias is random selection: every member of the target population has a known, nonzero chance of being chosen. Simple random sampling is the purest form, but practical research often uses stratified sampling, where the population is divided into meaningful subgroups (by age, geography, income) and random samples are drawn from each. This ensures that important subgroups aren’t accidentally left out.
When perfect random sampling isn’t possible, which is most of the time, researchers can apply corrections after the data is collected. The most common technique is called post-stratification or weighting. The idea is straightforward: if your sample is 10% elderly adults but the actual population is 20% elderly, you give each elderly respondent’s answers more weight in the final calculations. The sample is divided into cells based on known population characteristics, and each cell is weighted to match its true proportion in the population. A related method called raking adjusts for multiple variables at once, looping through each variable and reweighting iteratively until the weights stabilize.
These corrections help, but they have limits. Weighting can only adjust for differences you know about and can measure. If your sample is missing an entire group for reasons you haven’t identified, no statistical technique can fill that gap. The best protection is still thoughtful study design that anticipates which groups might be excluded and takes steps to include them from the start.
How to Spot It as a Reader
You don’t need a statistics degree to notice sampling bias. When you encounter a study, poll, or statistic, ask three questions. First, who was actually included? A national health claim based on data from a single university hospital should raise a flag. Second, who might have been left out, and could they be different from the people who were included? If a survey about work-life balance only reached people with stable internet access during business hours, it missed a lot of workers. Third, did people choose to participate, or were they randomly selected? Voluntary participation almost always skews results.
The size of a sample doesn’t protect against sampling bias. A survey of one million people that only reaches one type of person is less accurate than a properly randomized survey of one thousand. The method of selection matters far more than the number of responses.

