How to Determine Effect Size for Power Analysis

The effect size you enter into a power analysis is the single input that most influences how many participants you need, yet it’s the one researchers struggle with most. A small effect (Cohen’s d of 0.2) can require nearly 800 participants per group, while a large effect (d of 0.8) may need only 8. Getting this number wrong in either direction means you either waste resources on an oversized study or run one too small to detect anything meaningful. There are several reliable ways to arrive at a defensible effect size, and the best choice depends on what information you already have.

Why Effect Size Drives Everything

Power analysis balances four connected quantities: your sample size, your significance threshold (alpha, usually .05), your desired power (usually 80% or higher), and the effect size you expect to find. Fix any three, and the fourth is determined. In practice, alpha and power are set by convention, so the relationship boils down to effect size and sample size. They move in opposite directions: the smaller the effect you’re trying to detect, the more participants you need.

To illustrate how dramatic this relationship is, holding power constant at 80% and alpha at .05, a study expecting an effect size of 0.2 needs roughly 788 participants. Bump that expected effect to 1.0 and you need about 34. At 2.5, you need only 8. This is why a vague or careless effect size estimate can derail a study before data collection begins.

Option 1: Use Published Research in Your Area

The strongest approach is to base your effect size on empirical data from studies asking similar questions in similar populations. Search for meta-analyses or systematic reviews in your topic area and look at the pooled effect sizes they report. These estimates synthesize findings across multiple studies, making them more stable than any single result.

This matters because generic benchmarks often overestimate what you’ll find. A large-scale analysis of meta-analyses in gerontology, for example, found that the real-world distribution of effect sizes was consistently smaller than Cohen’s classic guidelines would suggest. The empirically derived benchmarks for group differences in that field were d = 0.16 (small), 0.38 (medium), and 0.76 (large), compared to Cohen’s 0.20, 0.50, and 0.80. For correlations, the field-specific values were r = .12, .20, and .32, versus Cohen’s .10, .30, and .50. A researcher expecting a “medium” correlation of .30 and finding the true value is closer to .20 would need 193 participants instead of the 85 that Cohen’s benchmark implies.

The takeaway: look for effect sizes observed in your specific research area rather than relying on one-size-fits-all rules. If a meta-analysis reports that interventions of your type typically produce a d of 0.35, that’s your best starting point.

Option 2: Define the Smallest Meaningful Difference

In clinical and applied research, the most principled approach is to specify the minimum clinically important difference (MCID), the smallest change in an outcome that would actually matter to patients or practitioners. This reframes the question from “what effect can I detect?” to “what effect is worth detecting?”

For example, if you’re studying a pain intervention and clinicians agree that a reduction of at least 2 points on a 10-point scale would change treatment decisions, that 2-point difference is your target. You then divide it by the standard deviation of the outcome measure (drawn from prior research) to get a standardized effect size you can plug into software. If the standard deviation of pain scores in your population is typically 4 points, your effect size is 2/4 = 0.50.

This approach keeps you focused on practical significance rather than statistical significance, which is the whole point of powering a study. A statistically significant result that falls below the MCID isn’t useful to anyone making real decisions.

Option 3: Cohen’s Benchmarks as a Last Resort

When you have no prior data and no way to define a meaningful difference, Cohen’s conventional benchmarks provide a fallback. These are the defaults most power analysis tools offer:

Cohen’s d (t-tests comparing means): 0.20 small, 0.50 medium, 0.80 large
Cohen’s f (ANOVA): 0.10 small, 0.25 medium, 0.40 large
f² (multiple regression): 0.02 small, 0.15 medium, 0.35 large
Pearson’s r (correlations): .10 small, .30 medium, .50 large
Partial eta squared (ANOVA variance explained): .01 small, .06 medium, .14 large

Cohen himself said these should only be used when no specific information is available. A growing consensus among researchers reinforces this caution. In a recent survey, a majority of methodologists recommended alternative strategies over conventions, and on an 11-point scale, respondents rated the urgency of revising existing benchmarks at a median of 8. The replication crisis has made it clear that many published effects are inflated, and conventions built on that literature can lead you astray.

If you do use benchmarks, choose conservatively. Powering for a “medium” effect when the true effect is small means your study will be underpowered. When in doubt, lean toward the small end.

Why Pilot Studies Are Risky for This Purpose

It’s tempting to run a small pilot study, calculate the effect size, and use that number to plan your main study. This approach has serious problems. Pilot samples are usually small and unrepresentative, which makes their effect size estimates unstable. A pilot with 30 people per group can easily produce an effect size that’s double or half the true value, leading to a power calculation that’s wildly off.

Current guidelines recommend against using pilot studies to estimate effect sizes for power calculations. If estimating group differences is a goal of your pilot, you’d need 70 to 100 participants per group to get reasonable precision, at which point it’s no longer a pilot. Pilot studies are better used to assess feasibility (recruitment rates, dropout, measurement procedures) than to generate the numbers that drive your sample size calculation.

Matching Effect Size to Your Statistical Test

Power analysis software like G*Power requires the effect size in a specific format depending on your planned analysis. Using the wrong metric is a common mistake.

For a t-test comparing two independent groups, you need Cohen’s d: the difference between the two group means divided by the pooled standard deviation. If you expect Group A to score 75 and Group B to score 70, and the standard deviation in both groups is about 10, your d is (75 – 70) / 10 = 0.50.

For a paired t-test (before-and-after in the same people), the effect size (called d_z) accounts for the correlation between measurements. Because repeated measures on the same person are correlated, the effective variability is smaller, and the same raw difference translates to a larger standardized effect. You’ll need an estimate of how strongly the two measurements correlate, which you can often find in published studies using similar designs.

For a one-way ANOVA, G*Power uses Cohen’s f, which captures how spread out the group means are relative to within-group variability. If you have specific group means in mind, you can compute f directly: calculate the standard deviation of those predicted means and divide by the expected within-group standard deviation.

For regression, the effect size f² is calculated from your expected R² value: f² = R² / (1 – R²). If you expect your predictors to explain 10% of the variance (R² = .10), then f² = .10 / .90 = .11, which falls between a small and medium effect.

Converting Between Effect Size Metrics

Sometimes the published literature reports effect sizes in a different metric than your software requires. A few conversions are straightforward. To move between Cohen’s d and Pearson’s r, or between odds ratios and d, simple formulas exist. One widely used conversion: take the natural log of an odds ratio and divide by 1.81 to get an approximate Cohen’s d. If a meta-analysis reports an odds ratio of 2.0 for your intervention, ln(2.0) = 0.69, and 0.69 / 1.81 ≈ 0.38, giving you a d you can enter into G*Power.

To convert between partial eta squared (η²) and Cohen’s f for ANOVA designs, use f = √(η² / (1 – η²)). If a prior study reports η² = .06 (a medium effect), then f = √(.06 / .94) = √(.064) ≈ 0.25, which matches the medium convention for f.

A Practical Workflow

Start by identifying the statistical test you plan to run, since this determines which effect size metric you need. Then work through these sources in order of preference: first, look for meta-analyses or large studies in your specific area and extract the relevant effect size directly. Second, if your research has a clinical or applied goal, define the smallest difference that would be practically meaningful and convert it to a standardized metric using the expected standard deviation. Third, if neither source is available, use Cohen’s benchmarks, but lean toward the lower end and be transparent about this choice in your write-up.

Whichever approach you use, run a sensitivity analysis: calculate your required sample size for a range of plausible effect sizes rather than a single point estimate. If you need 64 participants per group for d = 0.50 but 393 per group for d = 0.20, and you’re not confident the effect is truly medium, you have a clearer picture of the risk you’re taking. This gives reviewers and funders confidence that you’ve thought carefully about uncertainty rather than just plugging in a convenient number.