What Is a Meta-Analysis in Psychology and How It Works

A meta-analysis in psychology is a statistical method that combines the results of multiple independent studies on the same question to produce a single, more precise estimate of an effect. Instead of relying on one experiment with 50 participants, for example, a meta-analysis might pool data from 30 experiments involving thousands of participants total, giving researchers far more confidence in the overall finding. The technique has become one of the most influential tools in psychological science, used to settle debates about everything from whether therapy works to how screen time affects children.

How It Differs From a Literature Review

A systematic review is a structured process for finding and evaluating every relevant study on a specific question. Researchers define search terms, screen hundreds or thousands of papers, and apply strict criteria to decide which studies qualify. A meta-analysis adds a statistical layer on top of that process: it takes the qualified studies and mathematically combines their data to calculate an overall effect size and a confidence interval around it.

Not every systematic review includes a meta-analysis. Sometimes the studies are too different from each other in design or measurement to be meaningfully combined with statistics. In those cases, researchers summarize the findings narratively, looking for themes and patterns without pooling numbers. But a meta-analysis is always built on a systematic review. You can’t combine data responsibly without first doing the careful work of identifying and vetting the studies that go into it.

Where the Method Came From

The term “meta-analysis” entered psychology in 1977, when Mary Lee Smith and Gene Glass published a landmark paper combining the results of psychotherapy outcome studies. At the time, there was genuine disagreement about whether psychotherapy was effective at all. Smith and Glass pooled findings across many trials and showed a clear, positive effect, essentially demonstrating that therapy works and quantifying how much. Their approach gave researchers a new way to move beyond the messy, contradictory results that pile up in any active area of research, and the method spread rapidly through psychology and medicine.

What Effect Size Means

The core output of a meta-analysis is an effect size, a standardized number that describes how large a difference or relationship is across the combined studies. Psychology most commonly uses two types. The first is a standardized mean difference (Cohen’s d or the closely related Hedges’ g), which measures how far apart two groups are. A therapy group versus a control group, for instance. The second is a correlation coefficient (Pearson’s r), which measures the strength of a relationship between two variables, like the link between sleep quality and anxiety.

Cohen’s d values of 0.20, 0.50, and 0.80 are conventionally interpreted as small, medium, and large effects. For correlations, the equivalent benchmarks are 0.10, 0.30, and 0.50. These guidelines give readers a rough sense of whether a finding is practically meaningful or just statistically detectable. Hedges’ g is often preferred over Cohen’s d because it corrects for bias that creeps in when individual studies have small sample sizes, but the two are directly comparable and interpreted the same way.

How a Meta-Analysis Is Conducted

The process follows a fairly standard sequence. Researchers start by defining a precise research question and writing a protocol that specifies how they’ll search the literature, what databases they’ll use, and what criteria a study must meet to be included. Inclusion criteria typically cover things like study design (only randomized controlled trials, for example), the population studied, the type of intervention or variable measured, and a minimum level of methodological quality.

Next comes the search itself, which usually spans multiple databases and often includes hand-searching reference lists to catch studies that automated searches miss. Each study that turns up is screened, first by title and abstract, then by full text. Multiple reviewers typically work independently at each stage, with a process for resolving disagreements. Data extraction follows: reviewers pull the relevant statistics from each qualifying study, such as means, standard deviations, sample sizes, and any moderating variables. Finally, the extracted data are fed into statistical software that calculates the pooled effect size across all studies.

Fixed-Effects vs. Random-Effects Models

When combining study results, researchers choose between two statistical models. A fixed-effects model assumes that every study is estimating the exact same underlying effect, and that differences between study results are due only to chance. A random-effects model assumes that the true effect actually varies from study to study, perhaps because different populations, settings, or methods shift the result slightly. The random-effects model accounts for this extra layer of uncertainty.

In psychology, the random-effects model is generally the more appropriate choice, because psychological studies almost always differ in ways that could influence outcomes. A study on cognitive behavioral therapy conducted with college students in a university lab is not identical to one conducted with older adults in a hospital. The random-effects model captures that variability. Researchers may lean toward a fixed-effects model when there are very few studies available, because estimating the between-study variance becomes unreliable with limited data.

Measuring Inconsistency Between Studies

One of the most important questions in any meta-analysis is how much the individual study results disagree with each other. A statistic called I² (I-squared) answers this. It expresses the proportion of total variability in the meta-analysis that comes from genuine differences between studies rather than from random sampling error. An I² of 0% would mean all variability is just noise. An I² of 75% would mean three-quarters of the variability reflects real differences in study findings.

High heterogeneity doesn’t invalidate a meta-analysis, but it does signal that something interesting is going on. When I² is large, researchers typically dig into moderator analyses, splitting the studies into subgroups to figure out what’s driving the disagreement. Maybe the effect is strong in children but weak in adults, or robust with one type of measurement but not another. These subgroup findings are often the most useful part of a meta-analysis, because they reveal when and for whom an effect holds.

Reading a Forest Plot

The signature visual of a meta-analysis is the forest plot, a chart that displays each study’s result on its own line. Every study appears as a small box positioned along a horizontal axis. The center of the box marks that study’s point estimate of the effect, and the horizontal line running through the box shows the 95% confidence interval, the range within which the true effect likely falls. Larger boxes indicate studies that carry more weight in the analysis, usually because they had bigger sample sizes.

At the bottom of the plot sits a diamond. This represents the overall pooled effect from all the studies combined. The center of the diamond is the meta-analytic estimate, and its width shows the confidence interval for that combined result. A vertical line, usually at zero or one depending on the type of effect, marks “no effect.” If a study’s confidence interval crosses that line, the study alone didn’t find a statistically significant result. If the diamond doesn’t cross the line, the combined evidence points to a real effect.

The Problem of Publication Bias

Meta-analyses can only combine studies that exist in the published literature, and that creates a well-known problem. Studies with exciting, statistically significant results are more likely to get published than studies that find nothing. This is sometimes called the “file drawer problem,” the idea that null results sit in researchers’ file drawers and never see print. If a meta-analysis inadvertently includes only the studies that “worked,” its pooled effect size will be inflated.

Researchers use several tools to detect this bias. A funnel plot graphs each study’s effect size against its precision (usually related to sample size). In the absence of bias, the plot should look like a symmetrical inverted funnel, with smaller studies scattering widely and larger studies clustering near the combined estimate. If one side of the funnel is conspicuously empty, that asymmetry suggests missing studies. Because eyeballing a funnel plot is subjective, researchers back it up with statistical tests like Egger’s test, which formally checks for asymmetry. These tests work best when at least 10 studies are available; with fewer, they lack the statistical power to detect bias reliably.

Reporting Standards: PRISMA

To ensure meta-analyses are transparent and reproducible, most psychology journals require authors to follow the PRISMA 2020 guidelines, a 27-item checklist covering every stage of the review. Items range from describing the rationale for the review to declaring competing interests. The guidelines require authors to report how many reviewers screened each record, whether they worked independently, and how disagreements were resolved. If automation tools were used to help screen studies, the checklist requires disclosure of how those tools fit into the process.

PRISMA also includes a standardized flow diagram showing exactly how many records were identified, screened, excluded (and why), and ultimately included. This diagram lets readers quickly judge whether the search was thorough and the filtering was reasonable. A meta-analysis that doesn’t follow PRISMA is harder to evaluate and, increasingly, harder to publish.

Why Meta-Analyses Matter in Psychology

Psychology as a field faces particular challenges that make meta-analysis especially valuable. Individual studies often use small samples, measure abstract constructs like self-esteem or working memory that are hard to pin down, and produce results that vary depending on the cultural context or the specific questionnaire used. A single study showing that mindfulness reduces anxiety is interesting but limited. A meta-analysis combining 80 such studies can estimate how large the effect really is, identify which populations benefit most, and flag whether the overall evidence base might be skewed by publication bias.

The method also plays a practical role in clinical psychology, where it informs treatment guidelines. When professional organizations recommend a particular therapy for depression or PTSD, that recommendation is typically grounded in meta-analytic evidence showing the therapy outperforms alternatives by a meaningful margin. For anyone reading psychological research, understanding what a meta-analysis does, and what its limitations are, makes it much easier to judge how seriously to take any individual claim.