A good sample accurately reflects the larger group it’s drawn from, producing results you can trust and apply beyond the study itself. That sounds simple, but achieving it requires getting several things right: defining who you’re studying, choosing enough people, picking them fairly, and making sure the ones who participate don’t differ meaningfully from those who don’t. Here’s how each of those pieces works.
It Represents a Clearly Defined Population
The single most important quality of a good sample is representativeness. A sample is representative when its results match, within a reasonable margin of error, what you’d find if you could study the entire population. This happens when the people in your sample share the same mix of key characteristics (age, gender, health status, geography, income, or whatever matters for your question) as the broader group you care about.
Random sampling is the gold standard for achieving this. When every member of your target population has an equal chance of being selected, the sample will, on average, mirror the population’s diversity without you having to engineer it. But representativeness can also work at a more interpretive level. Even when exact numbers differ between a sample and the population, a study can still be useful if the general direction of its findings would hold true in the wider group. A clinical trial conducted at a single hospital, for example, may not perfectly match national demographics, but the biological mechanism it uncovers could still apply broadly.
None of this means anything, though, without a clearly defined target population. Saying a sample is “representative” without specifying representative of whom is meaningless. Before collecting any data, you need to spell out exactly which group your results are meant to describe.
The Sample Is Large Enough to Be Reliable
Size matters, but not in the way most people assume. A bigger sample isn’t automatically better. What you need is a sample large enough to detect a real effect if one exists, while being precise enough that your results aren’t dominated by random noise.
Researchers calculate sample size using a handful of inputs: the level of precision they want (margin of error), the confidence level they’re targeting, the statistical power of the study, and the expected size of the effect they’re looking for. The standard convention is to set confidence at 95%, power at 80%, and aim for a margin of error between 4% and 8%. These aren’t arbitrary numbers. A 95% confidence level means that if you repeated the study 100 times, the true value would fall within your margin of error in 95 of those repetitions. Power of 80% means you have an 80% chance of detecting an effect that actually exists.
One common misconception is that you always need to know the total population size to calculate sample size. In most cases, population size doesn’t factor into the formula at all. It only matters when the population is small and finite, like all students enrolled in a specific program or all employees at a single company. For large populations, adding more people to the total doesn’t change how many you need in your sample.
Qualitative Research Works Differently
In qualitative research (interviews, focus groups, case studies), sample size isn’t about statistical power. It’s about reaching “saturation,” the point where new interviews stop producing new insights. A systematic review of empirical studies found that saturation typically occurs within 9 to 17 interviews or 4 to 8 focus group discussions, particularly when the study population is relatively similar and the research questions are narrowly focused. Multi-country studies or those exploring broad themes needed larger samples. This gives researchers a practical starting point, though the right number still depends on the complexity of the topic and the diversity of participants.
Bias Is Minimized at Every Stage
A sample can be the right size and still produce misleading results if bias creeps in. Bias means the sample systematically differs from the population in ways that skew the findings. It can enter at several points.
Selection bias happens when the method used to recruit participants favors certain types of people over others. If you survey customers who visit a store on weekday mornings, you’ll over-represent retirees and underrepresent people who work 9-to-5 jobs. The sample looks complete, but it’s tilted.
Non-response bias is the problem that arises when people who decline to participate differ from those who agree. In health surveys, for instance, the sickest individuals may be too ill to respond, and the healthiest may not see the point. Both extremes drop out, leaving a sample that clusters in the middle and misrepresents the true range. These two types of bias can pull estimates in opposite directions, sometimes inflating a number, sometimes deflating it, making the distortion hard to predict without careful analysis.
Volunteer bias is a specific form of selection bias where people who opt into a study are inherently different from those who don’t. Volunteers tend to be more motivated, more health-conscious, or more affected by the topic being studied. This is why clinical trials that rely entirely on volunteers can’t always generalize their findings to the broader population.
Response rates are one practical way to gauge how much non-response bias might affect your results. For most research, a response rate around 60% is considered the benchmark. Email surveys without follow-up often land at just 25% to 30%, which introduces serious risk of bias. Multi-mode approaches, combining email with phone calls or mailed reminders, can push rates as high as 70%.
The Sampling Frame Is Complete
Before you can draw a sample, you need a list of everyone in the target population. This list is called the sampling frame, and its quality directly determines whether your sample can be representative. An ideal frame lists every member of the population exactly once. In practice, that rarely happens.
The most common problem is undercoverage, where some members of the target population are missing from the list entirely. If you’re surveying households using a phone directory, you’ll miss everyone with an unlisted number and everyone who only uses a cell phone. Phone-based lists are especially prone to this kind of gap. The people who are missing aren’t random; they tend to share characteristics (younger, lower income, more transient) that make their absence a source of systematic bias rather than just random noise.
The reverse problem, overcoverage, happens when the list includes people who aren’t actually part of your target population. A list of registered voters, for example, might include people who have moved away or died. Overcoverage is generally easier to fix during data collection (you simply screen people out), but undercoverage is harder to address because you can’t include people you don’t know about. If the missing people happen to be similar to the people on the list in the ways that matter for your study, the bias may be small. But if they differ systematically, your results will be off in ways you can’t easily correct after the fact.
Eligibility Criteria Are Thoughtfully Defined
Good samples don’t just happen through random selection. Researchers also define who qualifies for the study and who doesn’t, using inclusion and exclusion criteria. These rules shape the sample’s composition and determine how broadly the results can be applied.
Inclusion criteria describe the essential characteristics someone must have to participate. These typically cover demographics (age range, gender), clinical features (a specific diagnosis, symptom duration), and geographic location. They should connect directly to the research question. If a study is investigating a treatment for knee arthritis in adults over 50, including 25-year-olds would dilute the findings without adding useful information.
Exclusion criteria remove people who technically qualify but whose participation could compromise the data. Common reasons for exclusion include conditions that would make someone likely to drop out, comorbidities that could confuse the results, or circumstances that would lead to inaccurate data collection.
Researchers sometimes make avoidable mistakes here. One frequent error is using the same variable for both inclusion and exclusion, like including only women and then listing “being male” as an exclusion criterion. That’s redundant and muddies the protocol. Another mistake is including criteria that have nothing to do with the research question, which narrows the sample unnecessarily and makes the results harder to generalize. Perhaps most importantly, failing to describe key inclusion variables makes it impossible for anyone reading the study to judge whether the findings apply to a different group.
Practical Signs of a Strong Sample
If you’re evaluating someone else’s research or planning your own data collection, a few quick checks can tell you whether the sample holds up:
- The target population is explicitly stated. You should be able to identify exactly who the results are meant to describe.
- The sampling method is transparent. Random selection is strongest, but even convenience samples can be useful if their limitations are acknowledged and the population is well-defined.
- The sample size is justified. Look for a power calculation or, in qualitative work, a rationale for why saturation was reached.
- Response rates are reported. Anything well below 60% in survey research should raise questions about who’s missing.
- The sampling frame is described. Knowing where the list of potential participants came from helps you spot gaps in coverage.
- Inclusion and exclusion criteria are specific and relevant. They should connect to the research question without being unnecessarily restrictive.
No sample is perfect. Every study involves trade-offs between cost, feasibility, and rigor. What separates a good sample from a poor one isn’t the absence of limitations, but whether those limitations are understood, disclosed, and accounted for in the interpretation of results.

