How to Justify Sample Size in Quantitative Research

Justifying your sample size means showing reviewers and readers that you chose a number of participants for a defensible reason, not out of convenience. The gold standard is a formal power analysis, but it’s not the only valid approach. Depending on your study’s goals, you can justify sample size through statistical power, desired precision, pilot data, or even practical constraints, as long as you document your reasoning transparently.

The Four Components of a Power Analysis

A power analysis is the most widely accepted method for justifying sample size in quantitative research. It works by linking four interdependent values: your significance level (alpha), your desired statistical power, the expected effect size, and the sample size itself. If you know any three, you can calculate the fourth. In practice, researchers fix alpha and power at conventional levels, estimate the effect size, and solve for the sample size they need.

Alpha is the probability of a false positive, meaning you conclude there’s an effect when there isn’t one. The standard threshold is 0.05, though some fields use stricter values like 0.025 or 0.01. Power is the probability of detecting a real effect when one exists, calculated as one minus the probability of a false negative. The conventional minimum is 80%, and many studies aim for 90%. A review of randomized trials found that about 61% set power between 80% and 89%, while roughly 17% failed to report their power level at all. Reporting yours, and explaining why you chose it, immediately strengthens your justification.

Effect size is where most researchers struggle. It represents the magnitude of the difference or relationship you expect to find. A larger expected effect means you need fewer participants to detect it; a smaller one requires more. The next section covers how to estimate this crucial number.

How to Estimate Your Expected Effect Size

You have three main options for choosing an effect size, and the best choice depends on what prior information is available.

The strongest approach is to pull effect sizes from previous studies that used the same variables, measures, and population you plan to use. If three prior studies comparing the same two interventions found differences of 0.45, 0.52, and 0.38 standard deviations, you have a reasonable basis for estimating your own expected effect. Cite those studies explicitly in your methods section.

When no prior literature exists, you can use Cohen’s conventional benchmarks. For a t-test, small, medium, and large effect sizes correspond to 0.2, 0.5, and 0.8. For ANOVA, the conventions are 0.1, 0.25, and 0.4. These benchmarks are useful starting points, but reviewers will be more persuaded if you explain why a particular size is realistic for your context. Saying “we powered for a medium effect of 0.5 because our intervention is similar in scope to X” is much stronger than citing the benchmark alone.

A third option is to define the smallest effect that would be practically meaningful. This is sometimes called the minimally important difference. Instead of asking “what effect do I expect?”, you ask “what is the smallest result that would actually matter in practice?” If a training program needs to improve test scores by at least 5 points to be worth implementing, you power your study to detect that 5-point difference. This approach ties your sample size directly to real-world significance, which makes for a compelling justification.

Using Pilot Data Carefully

Pilot studies have traditionally been used to estimate effect sizes for planning larger trials, but this practice has come under scrutiny. The core problem is that pilot samples are usually small and unrepresentative. The variance and effect size estimates they produce can be inaccurate, leading to misleading power calculations. A typical pilot with 30 participants per group is too small to provide reasonable precision. If you plan to use a pilot study specifically to estimate group differences for powering a larger trial, guidelines suggest you may need 70 to 100 participants per group.

The safer way to use pilot data is to focus on the variance of your outcome measure rather than the point estimate of the effect. You can use the observed variance to run sensitivity analyses across a range of plausible effect sizes, then check whether your variance estimate aligns with other studies using the same measures. Report the confidence interval around your variance estimate, not just the single number. This gives reviewers a transparent picture of the uncertainty involved. If your pilot was small, acknowledge that the estimates may shift with a larger sample and explain how you accounted for that uncertainty.

Precision-Based Justification

Not every study is designed to reject a null hypothesis. If your goal is to estimate a population parameter, like a prevalence rate, a mean difference, or a proportion, you can justify sample size based on the precision you want in your estimate. This means choosing how narrow you want your confidence interval to be.

The logic is straightforward. A confidence interval has a margin of error, which is half its total width. If you want a 95% confidence interval no wider than 10 percentage points, your margin of error is 5 percentage points. You then solve for the sample size that achieves that margin given your expected variability. For proportions, the formula uses the standard normal value of 1.96 (for 95% confidence), the expected proportions, and your desired margin of error. If you have no idea what the proportions will be, you can assume maximum variability (0.5 for each proportion), which gives you a conservative, larger sample size.

This approach is especially useful for descriptive studies, surveys, and epidemiological research where hypothesis testing isn’t the primary aim. When writing it up, state your target confidence level, your desired margin of error, the assumptions you made about variability, and the resulting sample size.

Working With Small or Fixed Populations

Standard sample size formulas assume you’re sampling from an effectively infinite population. When your target population is small and known (say, all 200 nurses at a specific hospital, or all 85 members of a professional organization), and you plan to sample more than 5% of them, you should apply the finite population correction factor. This adjustment reduces your required sample size because sampling a large share of a known population gives you more information per participant than sampling the same number from an unlimited pool.

The correction works by multiplying your standard error by a factor based on the ratio of your sample to the population. In practical terms, if your standard formula says you need 150 participants but the entire population is only 200, the corrected calculation will yield a smaller number. Always report the total population size, the correction you applied, and the adjusted sample size.

When Practical Constraints Set the Limit

Sometimes your sample size is capped by funding, time, or the number of available participants. This is common in feasibility studies, clinical research with rare conditions, and student dissertations. Using a constrained sample size is not inherently a weakness, but failing to acknowledge and address it is.

A review of feasibility studies found that rules of thumb were the most common sample size justification (8 out of 25 studies examined), and five studies gave unclear reasoning or none at all. Neither approach is convincing. If your sample size is fixed by practical limits, the best strategy is to reverse the power analysis: state your fixed sample size, then calculate and report the power you will have to detect various effect sizes. This lets reviewers see exactly what your study can and cannot detect. You can also report the minimum detectable effect size at 80% power given your available sample. If that minimum effect is still meaningful in your field, you have a solid justification.

For feasibility studies specifically, researchers are increasingly encouraged to define clear progression criteria (benchmarks that determine whether a larger trial is warranted) and then calculate the operating characteristics of those criteria at the available sample size. This shifts the justification from “we need X participants for power” to “here is what our decision rules can tell us with the participants we have.”

Avoid Unsupported Rules of Thumb

Rules of thumb like “10 participants per predictor variable” in regression have been widely cited for decades. Research has shown these heuristics are unreliable. A study on prediction models stated directly that rules like 10 events per predictor “should be avoided,” because the actual sample size needed depends on the specific characteristics of your data: the number of predictors, the expected effect sizes, the outcome rate, and the degree of correlation among variables. Two studies with the same number of predictors can require very different sample sizes.

If a reviewer sees “we used the 10:1 rule” as your only justification, it signals that you didn’t engage with the specifics of your design. A formal power or precision calculation tailored to your actual statistical test will always be more persuasive.

Tools for Running the Calculations

G*Power is the most commonly recommended free software for sample size and power calculations. It covers t-tests, ANOVA, chi-square tests, z-tests, correlation, regression, and exact tests through a graphical interface that doesn’t require programming knowledge. When you select a statistical test, it automatically provides Cohen’s conventional effect size values as a reference point. The current version is 3.1.9.7.

For more complex designs (multilevel models, structural equation modeling, Bayesian analyses), R packages like “pwr,” “simr,” and “simstudy” offer greater flexibility but require some coding. Stata and SAS also have built-in power analysis modules. Whichever tool you use, report it in your methods section so reviewers can verify your calculation.

Writing the Justification in Your Methods Section

A complete sample size justification includes five elements: the statistical test you plan to use, the alpha level, the desired power (or desired precision), the expected effect size and its source, and the resulting sample size. A well-written version might read: “We calculated the required sample size for an independent-samples t-test using G*Power 3.1. With alpha set at 0.05, power at 0.80, and an expected medium effect size of 0.50 based on Smith et al. (2022), the minimum required sample was 64 per group (128 total). To account for an anticipated 15% attrition rate, we aimed to recruit 150 participants.”

Notice that attrition adjustment. If you expect participants to drop out, your initial recruitment target should be inflated accordingly. Calculate your required analytic sample first, then increase it by your expected dropout rate. This is a small detail that demonstrates thorough planning. If your study involves multiple analyses or outcomes requiring different sample sizes, use the largest required sample. State this explicitly so it’s clear you powered for the most demanding comparison.