The sample size you need depends on three things: how small a difference you want to detect, how confident you need to be in the result, and how much natural variation exists in what you’re measuring. For most practical purposes, the answer ranges from a dozen participants for a simple qualitative study to several thousand for detecting tiny effects in quantitative research. There’s no single magic number, but there is a straightforward logic that gets you to the right one.
The Three Inputs That Determine Your Number
Every sample size calculation, whether for a clinical trial, a classroom experiment, or a website A/B test, boils down to the same core ingredients.
Effect size is how big of a difference or relationship you expect to find. A large, obvious effect (like the difference between taking a painkiller and taking nothing) needs fewer people to detect. A subtle effect (like a 1% improvement in click-through rate) needs far more. Cohen’s widely used benchmarks classify effect sizes as small (0.2), medium (0.5), or large (0.8). These aren’t rigid rules, but they give you a starting framework when you have no pilot data to work from.
Significance level (alpha) is the risk you’re willing to accept of finding a difference that isn’t actually real, a false positive. The standard in most fields is 5% (written as 0.05), meaning you accept a 1-in-20 chance of a false alarm. Some fields, especially medicine, use a stricter 1% threshold.
Statistical power is your chance of catching a real difference when one truly exists. The convention is 80%, which means you’ll correctly detect a real effect 4 out of 5 times. For high-stakes decisions, like a pivotal drug trial, researchers often raise this to 90%. Higher power requires more participants.
Ballpark Numbers for Common Scenarios
To give you a feel for how these inputs translate into actual numbers, here’s what happens when you hold alpha at 0.05 and power at 80%, then vary the effect size. These are for a basic two-group comparison (like treatment vs. control):
- Large effect (0.8): roughly 25–30 people per group
- Medium effect (0.5): roughly 60–65 per group
- Small effect (0.2): roughly 390–400 per group
Notice the pattern: as the effect you’re trying to detect gets smaller, the required sample size doesn’t just increase, it explodes. Cutting the effect size in half roughly quadruples the number of participants you need. This is the single most important relationship in sample size planning.
Sample Sizes for A/B Tests and Conversion Rate Experiments
If you’re running an A/B test on a website or app, the same logic applies, but the language shifts slightly. Instead of “effect size,” you’ll work with something called the minimum detectable effect (MDE), which is the smallest improvement you care about catching. If your baseline conversion rate is 5% and you want to detect a lift to 5.5%, that 0.5 percentage point difference is your MDE.
The relationship between MDE and sample size is dramatic. In one worked example from statistical modeling, detecting an effect of 1.0 required only 24 observations, while detecting an effect of 0.5 required 96, an effect of 0.2 required 600, and an effect of 0.1 required 2,400. Trying to detect an effect as small as 0.02 pushed the requirement to nearly 60,000. This is why experienced optimization teams set a realistic MDE before launching a test. Chasing tiny improvements with an undersized sample wastes time and produces unreliable results.
Sample Sizes for Surveys and Descriptive Studies
Surveys follow different math because you’re estimating a proportion or average rather than comparing groups. The key inputs are your desired margin of error (how precise you want your estimate) and how variable your population is. For a simple yes/no question where you assume maximum variability (50/50 split), you need about 385 responses for a ±5% margin of error at the 95% confidence level. Tightening the margin to ±3% pushes you to around 1,067. Tightening to ±1% requires roughly 9,600.
If your total population is small, say under 10,000 people, you can apply what’s called a finite population correction, which reduces the sample size because your sample represents a larger fraction of the whole group. For example, surveying a company of 500 employees requires fewer responses than surveying a city of 500,000 to achieve the same precision.
Sample Sizes for Qualitative Research
Qualitative studies (interviews, focus groups, case studies) don’t use statistical formulas at all. Instead, researchers aim for “saturation,” the point where new interviews stop producing new themes or insights. A systematic review of empirical saturation studies found that most reached saturation within 9 to 17 interviews for individual discussions and 4 to 8 sessions for focus groups. This held true when the study population was relatively similar and the research question was narrowly defined.
Multi-site or cross-cultural studies, or those exploring broad, layered themes, typically need more. But if you’re conducting interviews with a fairly uniform group and have a focused question, planning for 12 to 15 interviews is a reasonable starting point.
Adjusting for Dropout
Real studies lose participants. People move, lose interest, or miss follow-up appointments. You need to recruit more people upfront to compensate, and the correct way to do this is slightly counterintuitive.
If your calculation says you need 500 completers and you expect a 10% dropout rate, the instinct is to add 10% and recruit 550. That’s actually a miscalculation. The correct formula divides by one minus the dropout rate: 500 ÷ (1 − 0.10) = 556. The difference is small at low dropout rates, but if you expect 30% attrition, the gap between the wrong method (650) and the right one (715) becomes meaningful. Always round up.
How to Actually Calculate Your Number
You don’t need to solve these formulas by hand. G*Power is a free, widely used desktop application with a visual interface that handles sample size calculations for most common statistical tests, including t-tests, ANOVA, chi-square, and regression. You select your test type, plug in your alpha level, desired power, and expected effect size, and it returns the required sample size.
For A/B testing, most experimentation platforms (Optimizely, VWO, Google Optimize’s legacy tools) have built-in calculators. Evan Miller’s online sample size calculator is another popular free option for proportion-based tests.
If you genuinely have no idea what effect size to expect, start with Cohen’s medium benchmark (0.5 for group comparisons) as a default. Better yet, run a small pilot study of 10 to 20 participants to estimate the variability in your data, then use that estimate to calculate your real sample size. Pilot data almost always produces a more accurate and often smaller requirement than blind guessing.
Common Mistakes That Lead to Wrong Answers
The most frequent error is skipping the calculation entirely and picking a round number that “feels right.” Studies with too few participants waste resources because they lack the power to detect real effects, and studies with too many waste resources by recruiting beyond what’s necessary.
Another common mistake is ignoring effect size and focusing only on confidence level. A 95% confidence level sounds rigorous, but it means nothing if your study is powered to detect only large effects and the real effect is small. Power and effect size matter just as much as alpha.
Finally, people often forget that sample size requirements scale with complexity. Comparing two groups needs fewer people than comparing four. Adding control variables to a regression increases the requirement. Subgroup analyses (looking at results separately for men and women, for example) effectively cut your sample in half for each subgroup. If you plan to slice your data, you need to size for the smallest slice.

