What Is an Unrepresentative Sample and Why It Matters

An unrepresentative sample is a group of participants, responses, or data points that doesn’t accurately reflect the larger population it’s supposed to stand in for. When researchers study a question, they almost never examine every single person in a population. Instead, they select a sample and draw conclusions from it. If that sample is skewed in some systematic way, the conclusions will be wrong, sometimes dramatically so.

In a truly representative sample, every member of the population has an equal chance of being selected. An unrepresentative sample breaks that rule, whether by design, by accident, or by who chooses to participate.

Why Representativeness Matters More Than Size

One of the most common misunderstandings in statistics is that a big sample automatically equals a good sample. It doesn’t. A sample can include millions of people and still be wildly unrepresentative if the method used to gather it systematically excludes or overincludes certain groups. A larger sample will tend to give you more precise estimates of whatever population it does capture, but precision is useless if you’re precisely measuring the wrong group.

A small, well-selected random sample will outperform a massive, poorly selected one nearly every time. The distinction matters because size gives you statistical power (the ability to detect real differences), while representativeness gives you validity (the ability to apply those findings to the real world). You need both.

The 1936 Literary Digest Poll

The most famous example of an unrepresentative sample happened during the 1936 U.S. presidential election. Literary Digest, a major magazine at the time, mailed 10 million “straw ballots” to predict whether Franklin Roosevelt or Alf Landon would win. They got back 2.4 million responses, a massive dataset by any standard. Their prediction: Landon would win 57% to 43%.

Roosevelt won 62% to 37%. The poll was off by roughly 20 percentage points.

The problem was where the magazine got its mailing list. They pulled names from phone directories, car registrations, and country club memberships. In 1936, at the height of the Great Depression, those were markers of wealth. The central campaign issue was the economy, and Roosevelt’s New Deal appealed strongly to lower-income voters who were almost entirely absent from the sample. On top of that, only about 24% of recipients mailed their ballots back, introducing a second layer of bias: the people motivated enough to respond may have felt differently than those who didn’t bother.

The Literary Digest poll had an enormous sample. It just had the wrong people in it.

How Samples Become Unrepresentative

Sampling bias creeps in through several common paths.

Selection bias happens when the method of choosing participants systematically favors certain groups. Recruiting study volunteers from a single hospital, a single city, or a single website means you’re capturing people who share characteristics that may not match the broader population. A study of mental health among lawyers in one city, for instance, can’t reliably tell you anything about lawyers nationwide, because there’s no way to know how typical that city’s population is.

Nonresponse bias occurs when the people who decline to participate differ in meaningful ways from those who do. Nonresponse bias has two components: how many people didn’t respond, and how different their answers would have been. If both of those numbers are large, the bias is serious. Even a well-designed survey with a low response rate can produce skewed results if the non-responders hold systematically different views or experiences than the responders.

Voluntary response bias is a specific form of self-selection. When participation is entirely optional, the people who show up tend to be those with the strongest opinions or experiences. Think of online product reviews: satisfied customers often move on with their lives, while frustrated ones are motivated to write something. The resulting picture looks far more negative than reality.

Online Surveys Are Especially Vulnerable

The rise of online surveys has made unrepresentative sampling more common, not less. The number of published studies using online surveys has roughly doubled in just a few years, but the method carries inherent problems that researchers often understate or ignore.

First, online surveys only reach people who are literate and have internet access, which automatically excludes segments of many populations. Second, and more importantly, there’s often no way to define or describe the population that could have seen and responded to the survey. If you post a survey on social media, you have no idea who saw it, who ignored it, and who chose to click through. The denominator is unknown.

People who respond to online surveys tend to be those who feel strongly about the topic. Patients traumatized by a medical procedure, for example, are far more likely to complete a survey about it than patients whose experience was uneventful. The traumatized patients want to be heard. Others feel no pressure to respond. This self-selection means the sample over-represents extreme experiences, and there’s no reliable way to measure how much that skews the results.

What Goes Wrong With Unrepresentative Data

The core problem is that findings from an unrepresentative sample can’t be generalized. They may describe the people who happened to be in the study, but they don’t tell you anything reliable about the population you actually care about. In clinical research, this failure has real consequences. Trial findings may only apply to a narrow subset of the target population, missing important differences in how treatments work across different groups.

When study samples lack diversity, researchers can’t evaluate whether a treatment helps some people more than others, or harms certain groups entirely. A medication tested primarily on young men may behave differently in older women, but if older women weren’t in the sample, that difference is invisible. Inadequate representation also reduces the precision of estimates, making it harder to detect both benefits and risks.

Some degree of sampling bias exists in almost all studies. The question is always how much, and whether it’s large enough to change the conclusions.

How Representative Samples Are Built

The gold standard for avoiding unrepresentative samples is probability sampling, where every person in the target population has a known, equal chance of being selected. There are several common approaches.

Simple random sampling works when you have a complete list of everyone in the population (called a sampling frame). You draw names at random, like pulling from a hat or using a computer-generated list.
Stratified random sampling divides the population into subgroups based on characteristics like age, gender, income, or diagnosis, then randomly samples within each subgroup. This guarantees that minority or underrepresented groups appear in the sample in adequate numbers, which simple random sampling often fails to do.
Systematic sampling selects every nth person from a list or flow of participants, such as every fifth patient who visits a clinic. It doesn’t always require a complete list upfront.
Cluster sampling randomly selects groups (like schools, hospitals, or neighborhoods) and then studies everyone or a random subset within those groups. It’s practical when no master list of individuals exists.

Non-probability methods, where participants are chosen based on convenience, availability, or self-selection, are more common in practice but carry a higher risk of producing unrepresentative samples. They’re not automatically invalid, but they require much more caution when drawing conclusions.

How to Spot an Unrepresentative Sample

If you’re reading a study or survey and want to evaluate whether its sample is representative, look for a few things. Check how participants were recruited: was it random selection from a defined population, or convenience sampling from whoever was available? Look at the response rate. A survey with a 20% response rate should raise more concern than one with 75%, especially if there’s no analysis of how non-responders might differ.

Good studies collect basic demographic information (age, sex, location) from both participants and non-participants when possible, then compare the two groups. If respondents skew younger, wealthier, or more educated than the target population, the sample is likely unrepresentative in ways that affect the results. If a study doesn’t discuss its sampling method or acknowledge its limitations, that itself is a red flag.