Sampling in research is the process of selecting a smaller group of participants from a larger population so that findings from that group can be applied back to the population as a whole. Instead of studying every single person in a group (which is rarely possible), researchers pick a subset and use their responses to draw broader conclusions. The method used to pick that subset has a direct impact on whether those conclusions are trustworthy.
Every sampling plan revolves around three core concepts. The target population is the entire group the researcher wants to learn about. The sampling frame is the actual list of people available to be selected, such as a membership directory or patient registry. The sample is the final group chosen from that list. When the sampling frame closely matches the target population, results are more generalizable. When it doesn’t, bias creeps in.
Probability vs. Non-Probability Sampling
All sampling methods fall into two broad categories. In probability sampling, every person in the target population has a known, equal chance of being selected. This is the gold standard because it provides a mathematical basis for estimating how close your sample’s results are to the true population values. In non-probability sampling, selection is not random, meaning some people are more likely to be included than others. Most benchmarking studies show that non-probability samples tend to be less accurate than probability samples for describing a population.
That said, probability sampling requires more time, money, and access to a complete list of the population. Non-probability sampling costs significantly less and can be deployed quickly, which is why it remains common in exploratory studies, qualitative research, and situations where a full population list simply doesn’t exist.
Types of Probability Sampling
Simple Random Sampling
This is the most straightforward approach. You start with a complete list of everyone in the population (the sampling frame), then use a random number generator or lottery method to pick participants. Every person has an identical chance of being chosen. It works well when the population is relatively small and accessible, but it can underrepresent minority groups because the sample mirrors the overall population proportions.
Stratified Random Sampling
To solve the minority-representation problem, researchers divide the population into subgroups (called strata) based on characteristics like age, gender, income level, or diagnosis. They then draw a random sample from each subgroup separately. This guarantees that smaller groups are adequately represented and lets researchers compare results between strata as if each were its own mini-study. It requires more planning but produces more nuanced data.
Systematic Sampling
Instead of pure randomness, researchers pick every nth person from a list or a flow of participants. For example, selecting every 5th patient who walks into a clinic gives you the 5th, 10th, 15th, 20th, and so on. This method doesn’t always require a complete sampling frame ahead of time, making it practical in clinical settings where patients arrive continuously. The key requirement is that the list or flow has no hidden pattern that aligns with the sampling interval, which would introduce bias.
Cluster Sampling
When the population is spread across a wide geographic area, researchers divide it into clusters, often by location (schools, hospitals, neighborhoods). They randomly select a few clusters and then study everyone, or a random sample, within those chosen clusters. This is far cheaper than traveling to every location, but it sacrifices some precision because people within the same cluster tend to be more similar to each other than to the broader population.
Types of Non-Probability Sampling
Convenience Sampling
Researchers select whoever is easiest to reach. A professor surveying students in their own class or a clinic recruiting patients who happen to visit that week are both using convenience sampling. It’s fast and inexpensive, but the results can’t be confidently generalized because the sample may differ from the broader population in ways the researcher can’t measure.
Quota Sampling
This is a structured version of convenience sampling. Researchers set demographic targets (for instance, 50% women, 30% over age 65) and then fill those quotas using non-random selection. It produces a sample that looks like the population on paper, but because individuals within each quota aren’t randomly chosen, hidden biases can still distort the findings.
Purposive Sampling
The researcher uses their own judgment to hand-pick participants who are especially relevant to the research question. This is common in qualitative research where the goal isn’t to generalize to a huge population but to deeply understand a particular experience or phenomenon. If you’re studying how people cope after a rare surgery, for example, you’d deliberately seek out people who had that surgery rather than randomly sampling the general public.
Snowball Sampling
Participants recruit other participants. A researcher starts with a small group and asks each person to refer others who fit the study criteria. This is particularly useful for reaching populations that are hidden or hard to access, such as undocumented immigrants, people with stigmatized conditions, or members of underground communities. The tradeoff is that the sample tends to cluster around social networks, so it may miss people who are more isolated.
How Sampling Bias Distorts Results
Sampling bias occurs when certain groups are systematically excluded from the sample, making it unrepresentative. A phone survey, for instance, misses people without phones, people who only have cell phones with out-of-area codes, and people who screen their calls. Each of those exclusions removes a specific type of person, not a random one, which skews the data in a predictable direction.
Voluntary response samples are especially prone to bias. When researchers ask people to opt in, only those with strong opinions tend to participate. People who are indifferent to the topic rarely bother, so the results overrepresent extreme views. This is why online polls shared on social media are unreliable for drawing population-level conclusions. There’s no way to identify, describe, or account for the people who could have seen the survey but chose not to respond.
Online surveys face an additional layer of problems. They reach only people who are literate, have internet access, and are interested enough in the subject to spend time responding. The best-case scenario for an online survey is one sent individually to every member of a defined group (like all employees at a company) where most of them respond. In that situation, you can reasonably treat the results as representative of that group. A survey posted publicly on social media doesn’t meet that standard.
What Determines Sample Size
One of the most common questions in research design is “how many participants do I need?” The answer depends on three primary factors working together.
- Confidence level: How sure you want to be that your results reflect the true population. The convention in most social and biomedical research is 95%, meaning if you repeated the study 100 times, 95 of those samples would capture the true value. Physical sciences often use much higher thresholds, sometimes as high as 99.99%.
- Margin of error: The acceptable range of deviation between your sample’s results and the true population value. A smaller margin of error requires a larger sample.
- Effect size: The minimum difference between groups that you consider meaningful. If you’re looking for a small difference, like a 2% improvement from a treatment, you need far more participants than if you’re looking for a 20% improvement. Sample size varies inversely with effect size.
Variability in the population also matters. If the characteristic you’re measuring varies widely from person to person, you need a bigger sample to capture that range. In practical terms, researchers often end up selecting the largest sample they can manage given their budget and timeline, then checking whether that number meets the statistical requirements for their study design.
Steps in Building a Sampling Plan
Regardless of which method you choose, the sampling process follows a consistent sequence. First, define your target population clearly. “Adults with diabetes” is too vague. “Adults aged 18 to 65 diagnosed with type 2 diabetes in the past five years” gives you boundaries you can work with.
Second, build or obtain your sampling frame, the actual list from which you’ll draw participants. This might be a patient database, a school enrollment roster, or a membership directory. Third, determine your sample size based on your confidence level, margin of error, and available resources. Fourth, select your sample using one of the methods above. Fifth, contact the selected participants and obtain their consent to participate. At this stage, you’ll also need a plan for handling non-responses, since people who decline or can’t be reached may differ systematically from those who participate.
Choosing the Right Method
The best sampling method depends on what you’re trying to accomplish, not on which method sounds most rigorous. If you need results that generalize to an entire population and you have the budget for it, probability sampling is the clear choice. If you’re exploring a new topic, working with a hard-to-reach group, or conducting qualitative interviews where depth matters more than breadth, non-probability methods are not only acceptable but often more appropriate.
Many real-world studies use a combination. A national health survey might use cluster sampling to choose geographic areas, stratified sampling to ensure demographic representation within those areas, and then simple random sampling to pick individual households. The goal at every stage is the same: minimize the gap between who you study and who you want your findings to apply to.

