True random samples are rarely used because they require a complete list of every individual in a population, which is nearly impossible to obtain for most research questions. Even when researchers can assemble such a list, the cost of reaching randomly selected people across wide geographic areas, combined with steep declines in response rates, makes the method impractical for the vast majority of studies.
The Sampling Frame Problem
A true random sample starts with what statisticians call a sampling frame: a numbered list of every single member of the population you want to study. If you’re studying all hospitals in the U.S. that perform a specific surgery, building that list is doable. If you’re studying all adults with a particular health condition, or all parents of toddlers, or all residents of a country, creating a complete and accurate list ranges from extremely difficult to flat-out impossible.
This is where coverage error enters the picture. Any list you use as a starting point will leave people out. Electoral rolls exclude people under voting age. Telephone directories miss unlisted numbers and most mobile users. Email lists skip people without internet access. Administrative databases only capture individuals who have interacted with a particular system. Each of these gaps means certain groups are systematically underrepresented before the sampling even begins. Older adults, lower-income households, rural residents, people with disabilities, and racial minorities are consistently more likely to fall through these gaps, particularly in any digital-only sampling frame.
Clinical research faces the same wall. If you wanted to randomly sample from all people worldwide who have had a stroke, you would first need to identify and locate every one of them. That’s not feasible, so researchers instead recruit from accessible patient populations at specific hospitals or clinics, accepting the trade-off in generalizability.
Non-Response Destroys Randomness
Even when researchers manage to draw a genuinely random sample, they then need people to actually participate. This is where the method has deteriorated most dramatically over the past few decades. Response rates to Pew Research Center telephone surveys dropped from 36 percent in 1997 to 9 percent in 2016. Contemporary telephone surveys in the United States now regularly get response rates below 10 percent.
That matters enormously. A random sample with a 7 percent response rate is no longer random in any meaningful sense. The people who pick up the phone, open the envelope, or click the survey link differ systematically from those who don’t. They tend to be older, more educated, more politically engaged, and more likely to have free time. One analysis of a high-quality Pew poll built from random contacts found the proportion of young people was 10 percentage points lower than the actual population, people over 65 were overrepresented by 10 points, those with only a high school education were underrepresented by 11 points, and people with postgraduate degrees were overrepresented by 12 points.
When non-response is systematic like this, the combination of low response rates and consistent differences between responders and non-responders can severely bias any conclusions drawn about the broader population. What started as a probability sample effectively becomes a self-selected convenience sample with extra steps.
The Cost Is Prohibitive
Probability samples need to be large to work, and reaching randomly selected individuals scattered across a wide area is expensive. You can’t just survey whoever walks into your lab or responds to a social media ad. You need to contact specific people, often repeatedly, sometimes sending interviewers to their homes or mailing multiple follow-up requests. Each additional percentage point of response rate costs more than the last.
Convenience samples, by contrast, are cheap, efficient, and simple to execute. Researchers in developmental science, for instance, rely almost entirely on non-probability convenience samples because probability samples are cost-prohibitive and most existing probability samples aren’t designed to answer the developmental questions they’re studying. This pattern holds across many fields. When budgets are finite and timelines are tight, the practical advantages of non-random methods usually win.
Better Alternatives Exist for Most Purposes
Researchers don’t simply choose between perfect randomness and no randomness at all. Stratified random sampling divides a population into subgroups (by age, income, region, or other relevant characteristics) and then samples within each subgroup. This produces smaller estimation errors than simple random sampling, especially when the people within each subgroup are similar to each other. It also guarantees that important subgroups are represented, something a simple random draw might miss by chance alone.
Cluster sampling is another workaround. Instead of listing every individual, researchers randomly select groups (schools, neighborhoods, clinics) and then survey everyone or a random subset within those clusters. This dramatically reduces the need for a complete population list while preserving some of the statistical properties that make probability sampling valuable.
For surveys where even these methods are too expensive or logistically difficult, researchers increasingly rely on statistical weighting to adjust non-random samples after the fact. The most common technique, called raking, takes a set of variables where the true population distribution is known (gender, education level, age, race) and iteratively adjusts each respondent’s weight until the sample’s demographics match the population’s. More complex approaches layer in propensity scoring, which estimates how likely each type of person was to end up in the sample and corrects accordingly. Pew Research Center and other major polling organizations now routinely combine multiple weighting techniques to bring imperfect samples closer to what a random sample would have produced.
The Quality Gap Is Narrowing, but Real
Weighted non-probability samples can perform well for many research questions, but they aren’t a perfect substitute. One critical difference: the error in non-random samples is proportional to the size of the population rather than the size of the sample, whenever there’s a systematic difference between responders and non-responders on the topic being measured. In a true random sample, increasing the sample size reliably shrinks the margin of error. In a non-random sample, adding more respondents of the same skewed type doesn’t fix the underlying bias.
Data quality in online opt-in panels has also declined. A decade ago, researchers typically needed to discard 5 to 10 percent of responses from online panels due to poor quality (respondents clicking through without reading, bots, duplicate entries). That proportion has ballooned to 35 to 50 percent in recent estimates. Confidence intervals calculated from these panels are unlikely to be reliable for making population-level inferences without careful modeling.
The practical result is a landscape where true random sampling remains the theoretical gold standard, but the conditions it requires (a complete list, universal reachability, high willingness to participate, and a sufficient budget) almost never coexist outside of a few well-funded government surveys like the U.S. Census. Most research settles for the best approximation available, then uses statistical tools to close the gap.

