What Is Cohen’s d and How Do You Interpret It?

Cohen’s d is a number that tells you how big the difference is between two groups, expressed in standard deviation units. If a study finds that a treatment group scored higher than a control group, Cohen’s d answers the natural follow-up question: higher by how much? A d of 0.5, for instance, means the two group averages are half a standard deviation apart.

This makes Cohen’s d one of the most widely used “effect size” measures in research. It shows up constantly in psychology, education, medicine, and the social sciences, and understanding it gives you a much better sense of what a study actually found than a p-value alone.

How Cohen’s d Is Calculated

The formula is straightforward in concept. You take the difference between two group means and divide it by the pooled standard deviation of both groups:

d = (Mean₁ − Mean₂) / Pooled Standard Deviation

The numerator is simple: subtract one group’s average from the other. The denominator, the pooled standard deviation, combines the spread of scores from both groups into a single number. It accounts for how much individual scores vary within each group, correcting for sample size along the way. By dividing the mean difference by this shared yardstick, you get a result that’s unitless. It doesn’t matter whether the original scores were in pounds, test points, or milliseconds. Cohen’s d puts everything on the same scale, which is why researchers can compare effect sizes across completely different studies.

What the Numbers Mean

In 1988, the statistician Jacob Cohen proposed a simple set of benchmarks that are still the most commonly cited reference points:

0.2 = small effect. The difference exists but is subtle. You probably wouldn’t notice it by looking at the two groups casually.
0.5 = medium effect. The difference is noticeable and often practically meaningful.
0.8 = large effect. The groups are clearly different, with substantial separation between their averages.

A concrete way to picture this: a Cohen’s d of 0.5 means the average person in one group outperforms about 69% of people in the other group. At 0.8, the average person in the higher group outperforms roughly 79% of the other group. At 0.2, the overlap between groups is so large that the average person in the higher group only outperforms about 58% of the other.

Cohen himself cautioned that these benchmarks should only be used when you have no better frame of reference. In some fields, a d of 0.2 is a genuinely important finding. In others, 0.5 might be unremarkable. Researchers in gerontology, for example, found that empirical benchmarks in their field were closer to 0.16, 0.38, and 0.76 for small, medium, and large effects. A study of heart rate variability in case-control designs found values of 0.26, 0.51, and 0.88. The “right” interpretation depends heavily on the context of the research.

Why Effect Size Matters More Than You Think

Most people who encounter research results see p-values first. A p-value tells you whether a difference between groups is likely real or just due to random chance. What it does not tell you is how big that difference is. A p-value of 0.001 might sound impressive, but it could reflect a tiny, meaningless difference that only reached “statistical significance” because the study had 10,000 participants. As one widely cited paper in the Journal of Graduate Medical Education put it: “With a sufficiently large sample, a statistical test will almost always demonstrate a significant difference, unless there is no effect whatsoever.”

This is the core problem Cohen’s d solves. Unlike p-values, effect size is independent of sample size. A Cohen’s d of 0.3 means the same thing whether the study had 50 participants or 50,000. The p-value, by contrast, is “confounded” by sample size: the same real-world difference will produce a tiny, highly significant p-value in a huge study and a large, non-significant p-value in a small one.

Think of it this way. The p-value answers: “Is this difference real?” Cohen’s d answers: “Is this difference big enough to care about?” You need both pieces of information to evaluate a study properly, but effect size is arguably the more useful one for practical decision-making. A drug that lowers blood pressure with a Cohen’s d of 0.1 “works” in a statistical sense but may not justify its cost or side effects. A teaching method with a d of 0.7 is producing changes that students and teachers would actually notice in the classroom.

Cohen’s d vs. Hedges’ g

If you’ve seen Hedges’ g mentioned alongside Cohen’s d, they measure the same thing with one key difference. Cohen’s d slightly overestimates the true effect size when sample sizes are small. Hedges’ g applies a correction factor that adjusts for this bias. With large samples (roughly 20 or more per group), the two numbers are nearly identical. With smaller samples, Hedges’ g is considered the more accurate estimate. Both use the same 0.2 / 0.5 / 0.8 interpretation benchmarks.

Another related measure is Glass’s delta, which uses only the standard deviation of the control group rather than pooling both groups together. This is sometimes preferred when the two groups have very different levels of variability, since pooling their standard deviations in that situation can be misleading.

When Cohen’s d Works Best

Cohen’s d is designed for comparing two groups on a continuous measure, like test scores, reaction times, or blood pressure readings. It assumes the data in each group is roughly normally distributed and that both groups have similar levels of variability. When those assumptions hold, the pooled standard deviation in the denominator accurately represents the spread of both groups, and the resulting d value is easy to interpret.

When the two groups have very different variances, the pooled standard deviation becomes a less meaningful summary, and alternatives like Glass’s delta may be more appropriate. For studies with more than two groups, other effect size measures (like eta-squared for ANOVA designs) are typically used instead. And for research that examines relationships rather than group differences, Pearson’s r is the standard effect size metric, with its own set of benchmarks: 0.10, 0.30, and 0.50 for small, medium, and large.

How to Read Cohen’s d in a Study

When you encounter Cohen’s d in a research paper or news article about a study, you can interpret it in a few practical ways. First, check its sign. A positive or negative value simply reflects which group scored higher. The absolute value is what tells you the size of the effect. Second, compare it to the 0.2 / 0.5 / 0.8 benchmarks for a rough sense of magnitude, but keep the research context in mind. In fields where interventions typically produce small effects (education policy, public health campaigns), a d of 0.3 can represent a meaningful real-world impact.

Third, remember that Cohen’s d describes group averages, not individuals. A d of 0.5 does not mean every person in one group outperformed every person in the other. There is always substantial overlap between groups, even at large effect sizes. At d = 0.8, roughly 69% of the scores in the two groups still overlap. This is an important reality check when studies are reported with dramatic headlines. The effect can be real and statistically robust while still leaving most individuals in both groups with similar outcomes.