Choosing a sample size comes down to four decisions: how confident you need to be in your results, how precise those results must be, how large the effect or difference you’re looking for is, and how variable your population is. Get any of these wrong and you’ll either waste resources on an oversized study or, worse, run one too small to detect anything meaningful. The good news is that once you understand what drives the number, the calculation itself is straightforward.
The Four Inputs That Determine Sample Size
Every sample size calculation, whether for a clinical trial or a customer survey, relies on the same core ingredients. The values you choose for each one directly raise or lower the number of participants you need.
- Confidence level. This is how sure you want to be that your results reflect reality. A 95% confidence level is the industry standard for most research. A 90% level is sometimes acceptable for quicker or lower-stakes projects. Each confidence level has a corresponding Z-score used in formulas: 1.96 for 95%, 1.645 for 90%.
- Margin of error. This is the range of uncertainty you’ll tolerate. A margin of error of plus or minus 3% at 95% confidence is a common benchmark. Tightening that margin (say, to 1%) dramatically increases the sample you need. At 95% confidence, getting a 3% margin of error requires roughly 1,000 respondents, while a 2% margin pushes that to about 2,000.
- Population variability. If everyone in your population is similar, a small sample captures the picture. If responses or measurements vary widely, you need more people. For surveys measuring a yes/no outcome, maximum variability is assumed at 50/50 (meaning you have no idea which way people lean). For measurements like blood pressure or test scores, you use the standard deviation from prior studies.
- Population size. For large populations (tens of thousands or more), the total population barely matters. But for small, known populations, a correction factor shrinks the required sample. The adjusted formula is: n = m / (1 + (m − 1) / N), where m is the sample size you’d need for a large population and N is the actual population size.
How Statistical Power Fits In
If you’re comparing two groups rather than estimating a single number, statistical power becomes the driving concept. Power is the probability that your study will detect a real difference when one exists. The accepted minimum is 80%, meaning you have an 80% chance of catching a true effect. Many researchers aim for 90% when the stakes are high.
Power depends on three settings. First is the significance level (alpha), which represents the risk of a false positive. The standard is 0.05, meaning a 5% chance you’ll declare a difference that isn’t real. Some researchers have argued for tightening this to 0.005 to reduce false discoveries, though many in clinical research consider that impractical because it inflates sample sizes and discourages smaller, independent studies. Second is the effect size, which quantifies how big a difference you expect to find. Third is your sample size itself, which is why these calculations are circular: you’re solving for the sample size that achieves your desired power given your alpha and expected effect.
The smaller the effect you need to detect, the more participants you need. This is the single biggest driver of large sample sizes in experimental research.
Estimating Effect Size
Effect size is the piece most people struggle with because it requires a judgment call before you’ve collected any data. Cohen’s d is the most widely used measure for comparing two group averages. The classic benchmarks are 0.20 for a small effect, 0.50 for medium, and 0.80 for large. In practical terms, a small effect means the two groups overlap a lot and the difference is subtle. A large effect means the difference is obvious even in a modest sample.
These benchmarks are rough guides. Empirical reviews have found that actual effect sizes in published research tend to run smaller than Cohen’s original categories suggest. Across biomedical studies, the median effect size is closer to 0.26, and in psychosocial research it’s around 0.43. If you’re planning a study and have no pilot data, using a medium effect size (0.50) is a reasonable starting point, but if your intervention is subtle, plan for something smaller and budget for the larger sample that requires.
The best source for your effect size estimate is a previous study on the same topic. If none exists, that’s what a pilot study is for.
Pilot Studies: When You Don’t Have Prior Data
A pilot study gives you the variability estimates you need to calculate a proper sample size for a full study. Rules of thumb for pilot sample sizes vary, but the most cited recommendations range from 24 to 70 total participants. A common guideline is at least 12 per group in a two-arm trial, though some methodologists recommend 70 total to get a more stable estimate of variability.
Stepped recommendations based on expected effect size offer more precision. For a main trial designed with 90% power, you’d want about 75 participants per group in your pilot if you expect a very small effect, 25 per group for a small effect, and as few as 10 per group for a large effect. The logic is simple: the harder the effect is to see, the more pilot data you need to reliably estimate the variability around it.
Accounting for Dropouts
No study retains every participant. People move, lose interest, or become ineligible. If you calculate a sample size of 200 but 15% drop out, you’re left with 170, and your study may be underpowered. The standard adjustment is to inflate your initial recruitment target using the anticipated attrition rate. If you expect a dropout rate of ξ (say, 0.15 for 15%), multiply your calculated sample size by 1 / (1 − ξ). For a 15% dropout rate, that’s 1 / 0.85, or roughly 1.18. So your target of 200 becomes 236.
This simple multiplier is slightly conservative and may result in a study with a bit more power than you technically need. That’s generally preferable to the alternative. Attrition rates vary by field: longitudinal studies over months or years often see 20% to 30% loss, while a one-time survey might lose only 5%.
Sample Size for Surveys vs. Experiments
Surveys and experiments use different logic, even though the underlying statistics overlap.
For a survey estimating a proportion (like the percentage of people who support a policy), the key inputs are confidence level, margin of error, and your best guess at the true proportion. If you have no idea what the proportion might be, use 50%, which produces the largest (most conservative) sample size. At 95% confidence and a 3% margin of error, this gives you roughly 1,000 respondents for a large population. At 90% confidence with the same margin, about 750 is sufficient.
For an experiment comparing groups (does this drug work better than placebo?), the key inputs shift to power, alpha, and effect size. The calculation is less about margin of error and more about having enough data to distinguish a real effect from noise. A two-group comparison expecting a medium effect size (Cohen’s d of 0.50) at 80% power and alpha of 0.05 typically needs around 64 participants per group, or 128 total. Drop the expected effect to small (0.20) and that number jumps to roughly 400 per group.
Qualitative Research Has Different Rules
If you’re conducting interviews or focus groups rather than collecting numerical data, sample size is guided by saturation, the point where new conversations stop producing new themes or insights. A systematic review of empirical saturation studies found that 9 to 17 interviews typically reach saturation for studies with a relatively homogeneous population and well-defined research questions. For focus groups, 4 to 8 sessions is the usual range.
Studies with more diverse populations or broader research objectives may need more. Multi-country research and studies looking for deeper layers of meaning beyond initial codes are the most common outliers requiring larger samples.
Tools That Do the Math for You
You don’t need to work through formulas by hand. G*Power is the most widely recommended free tool for calculating sample size and power across a range of statistical tests, including t-tests, chi-square tests, F-tests, and more. It has a visual interface, so you don’t need programming skills. You enter your expected effect size, alpha, and desired power, and it returns the required sample size.
For R users, the “pwr” package handles similar calculations through code. Most commercial statistical software (SPSS, SAS, Stata) includes sample size modules as well, though they come with licensing costs. If you’re running a simple survey, many free online calculators will handle the proportion-based formula. Just make sure whatever tool you use lets you specify all four core inputs rather than hiding assumptions behind defaults.
Common Mistakes to Avoid
The most frequent error is skipping the calculation entirely and choosing a round number that “feels right.” A sample of 100 is not inherently adequate, and a sample of 1,000 is not inherently better. The right number depends entirely on what you’re measuring and how precise you need to be.
Another common mistake is using a convenience sample and then performing a power calculation after the fact to justify it. Post-hoc power analysis (calculating power after you already have your results) is widely considered misleading because it’s mathematically redundant with your p-value and tells you nothing new. The calculation only works as a planning tool, done before data collection.
Finally, ignoring the effect size or defaulting to “large” because it produces the smallest, cheapest sample is tempting but risky. If the true effect turns out to be small or medium, your study won’t have the power to find it, and you’ll conclude nothing works when it might have. Be honest about the likely effect size, and if that means you can’t afford the sample, it’s better to know before you start.

