What Is a Systematic Review and Meta-Analysis?

A systematic review is a rigorous method of collecting and analyzing all available research on a specific question. A meta-analysis is the statistical technique that can follow, combining the numerical results of those studies into a single pooled estimate. They sit at the top of the evidence hierarchy in medicine, meaning they carry more weight than any individual study, including randomized controlled trials. The two terms are often used together, but they’re not the same thing, and understanding the difference helps you evaluate the research you encounter.

How They Relate to Each Other

Think of a systematic review as the foundation and a meta-analysis as an optional second floor built on top of it. The systematic review does the work of finding, screening, and evaluating every relevant study on a question. If the data from those studies are similar enough to combine mathematically, researchers can then perform a meta-analysis to calculate an overall effect. But if the studies are too different from each other, the systematic review gets published on its own without that statistical pooling.

This means every meta-analysis requires a systematic review first (the careful selection of studies is a precondition), but not every systematic review includes a meta-analysis. A systematic review that finds five studies using wildly different methods and measuring different outcomes might simply summarize and compare the findings narratively, without ever combining the numbers.

Why They Rank Above Other Evidence

In the evidence pyramid used across medical science, systematic reviews and meta-analyses occupy the highest level. Below them sit randomized controlled trials, then cohort and case-control studies, then case reports, and finally expert opinion at the base. The reason is straightforward: a single study, no matter how well designed, reflects one group of patients, one setting, and one set of conditions. A systematic review pulls together data from multiple high-quality studies, which reduces the influence of any single study’s quirks or biases. When you add meta-analysis, the pooled result has more statistical power than any individual trial could achieve alone.

This is why clinical guidelines and public health recommendations lean heavily on systematic reviews. When your doctor follows a treatment protocol, there’s a good chance the evidence behind it traces back to one of these reviews rather than to a single landmark trial.

How a Systematic Review Is Conducted

The process starts with a clearly defined research question, typically structured around a specific patient population, intervention, comparison, and outcome. Before the search even begins, the research team writes and registers a protocol that spells out exactly how they’ll find studies, what criteria will include or exclude them, and how they’ll assess quality. Making this plan public in advance prevents researchers from cherry-picking studies that support a preferred conclusion.

The literature search is the most labor-intensive step. Researchers search multiple databases using carefully constructed queries built with Boolean operators (AND, OR, NOT) and controlled vocabulary terms specific to each database. A medical review typically searches at least two or three major databases, because no single source indexes every published study. Search strategies also account for abbreviations, alternate spellings, and related terms to avoid missing relevant work. Two reviewers independently screen the results, first by title and abstract, then by reading the full text of potentially eligible studies. This duplication reduces the chance that a relevant study gets overlooked or an irrelevant one slips through.

Once the final set of included studies is established, each one is assessed for risk of bias. Researchers look at things like whether participants were randomly assigned, whether outcomes were measured consistently, and whether data was reported selectively. Studies with high risk of bias don’t necessarily get thrown out, but their limitations are documented and factored into the conclusions.

How a Meta-Analysis Combines Results

When the included studies measure the same outcome in compatible ways, the meta-analysis calculates a weighted average of their results. Larger, more precise studies contribute more to this average than smaller ones. The result is a single effect estimate, like a risk ratio or mean difference, that represents the best available answer to the research question.

One key decision is whether to use a fixed-effect or random-effects model. A fixed-effect model assumes all the studies are estimating the same underlying effect, and any differences between them are due to chance. A random-effects model allows for the possibility that the true effect varies from study to study, perhaps because of differences in patient populations or treatment protocols. In practice, the random-effects model is more commonly appropriate because studies almost always differ in ways that could influence results. However, when only a few studies are available, the fixed-effect model is sometimes preferred because there isn’t enough data to reliably estimate how much variation exists between studies.

Measuring Inconsistency Between Studies

A statistic called I-squared quantifies how much the results of included studies differ from each other beyond what you’d expect from chance alone. The Cochrane Handbook, a widely used reference for review methodology, offers rough thresholds: 0% to 40% might not be important, 30% to 60% may represent moderate inconsistency, 50% to 90% may be substantial, and 75% to 100% is considered considerable. These ranges intentionally overlap because interpretation depends on context. An I-squared of 60% in a review of a straightforward drug comparison is more concerning than the same number in a review spanning diverse populations and settings.

Reading a Forest Plot

The forest plot is the signature visual output of a meta-analysis, and once you understand its components, you can quickly grasp what a review found. Each included study appears as a horizontal line with a square in the middle. The square’s position shows that study’s individual result (its effect estimate), and the square’s size reflects how much weight the study carries in the overall calculation. Larger squares mean the study contributed more information. The horizontal line extending through each square represents the confidence interval: the range within which the true effect likely falls.

At the bottom of the plot sits a diamond. The center of the diamond marks the pooled effect, the meta-analysis’s overall answer. The diamond’s width shows the confidence interval for that combined result. A narrow diamond means the estimate is precise; a wide one means there’s still considerable uncertainty. If the diamond crosses the vertical “no effect” line, the overall result is not statistically significant.

Detecting Publication Bias

One threat to any meta-analysis is publication bias, the tendency for studies with positive or dramatic results to get published while studies finding no effect sit in file drawers. If this happens, the meta-analysis is working with a skewed sample and will likely overestimate the true effect.

Researchers check for this using a funnel plot, which graphs each study’s effect size against its precision. In an unbiased set of studies, the plot should look roughly symmetrical, like an inverted funnel, with small studies scattered widely at the bottom and large studies clustered near the top. Asymmetry in this plot, particularly a gap where small negative studies should appear, suggests that unfavorable results went unpublished. Visual inspection alone is unreliable, so researchers also use statistical tests to formally measure asymmetry. The two most common are a rank-correlation test and a regression-based test that examines whether smaller studies systematically show larger effects.

How Quality Is Assessed

Not all systematic reviews are created equal. A tool called AMSTAR 2 evaluates how well a review was conducted across 16 items, covering everything from whether the research question was clearly defined to whether the authors disclosed conflicts of interest. Several of these are considered critical domains: using a comprehensive search strategy, assessing risk of bias in included studies, using appropriate statistical methods for the meta-analysis, and accounting for that bias when interpreting results.

The widely adopted PRISMA checklist serves a complementary purpose. Where AMSTAR 2 helps readers judge quality, PRISMA guides authors on what to report. Its 27-item checklist ensures that a published review includes enough detail for readers to understand why the review was done, how studies were identified and selected, and what was found. PRISMA also includes flow diagrams showing exactly how many studies were identified, screened, excluded (and why), and ultimately included. This transparency is what separates a systematic review from a traditional literature review, where an author might select studies informally and without documenting the process.

Systematic Reviews Without Meta-Analysis

It’s worth emphasizing that a systematic review without a meta-analysis is still valuable and still sits at the top of the evidence hierarchy. Sometimes the included studies measure outcomes so differently that pooling them statistically would be misleading. In these cases, the review presents a structured narrative synthesis: organizing, comparing, and interpreting results across studies without forcing them into a single number. The rigor comes from the transparent, reproducible search and selection process, not from the statistics.