What Does Replication Mean in Science and Why It Matters

In science, replication means running a study again to see whether the results hold up. More precisely, it involves collecting new data to test whether a previous finding can be confirmed by an independent effort. This is different from simply re-analyzing someone else’s original data. Replication is one of the core mechanisms that makes science trustworthy, because a finding that only appears once could easily be a fluke, a mistake, or the product of bias.

Replication vs. Reproducibility

These two terms are often used interchangeably in casual conversation, but they mean different things. The National Academies of Sciences, Engineering, and Medicine drew a clear line between them: reproducibility means taking the same data, the same code, and the same methods and checking that you get the same results. It’s essentially a verification step. Replicability means going out and gathering fresh data to answer the same question, then seeing if your new results match the original.

Think of it this way. If a chemist publishes a dataset and you download it, run the same analysis, and get the same numbers, that’s reproducibility. If you go into your own lab, run the same experiment from scratch with new samples, and find the same pattern, that’s replication. Both matter, but replication is the harder test because it asks whether a finding reflects something real about the world rather than something peculiar to one dataset or one lab.

Two Types of Replication

Scientists distinguish between direct replication and conceptual replication, and the difference matters.

A direct replication follows the original study’s methods as closely as possible, just with new participants or new samples. The goal is straightforward: if I do exactly what you did, do I see the same thing? This type of replication tests whether a specific result is reliable. It reflects what scientists currently believe is necessary to produce a given finding.

A conceptual replication takes a different approach. Instead of copying the original method, researchers test the same underlying idea using a different technique, a different population, or a different experimental setup. If the same pattern shows up across multiple methods, that builds stronger confidence in the explanation behind the result, not just in the result itself. For example, if one team finds that sleep deprivation impairs memory using a word-recall test, and another team finds the same impairment using a spatial navigation task, that’s conceptual replication. The finding isn’t tied to one particular test.

Both types are essential. Direct replication catches errors and false positives. Conceptual replication tests whether an idea generalizes beyond the specific conditions of one experiment.

Why Replication Makes Science Self-Correcting

The public trusts scientific findings largely because science is supposed to be self-correcting. When a study produces a wrong or misleading result, other scientists should eventually catch it by trying to replicate the work. A finding that can’t be replicated loses credibility over time, while one that holds up across many attempts earns it.

In practice, though, this system has some serious gaps. The scientific community has historically shown little interest in performing or publishing confirmatory studies. Repeating someone else’s experiment isn’t glamorous, and journals have traditionally preferred novel findings over confirmations of old ones. This means flawed studies can persist in the literature for years, continuing to be cited and used as the foundation for new research, even when the original result was shaky.

Some journals have started pushing back against this problem. A growing number now publish negative results and studies that contradict previous findings, on the principle that all rigorous answers to important questions deserve to be shared.

The Replication Crisis

Starting around 2011, several large-scale projects tried to systematically replicate findings across entire fields, and the results were sobering. In psychology, the Open Science Collaboration attempted to replicate 100 published studies and found that a significant portion did not hold up. In economics, similar efforts revealed gaps between original claims and replicated results.

In cancer biology, the Reproducibility Project attempted to repeat selected experiments from 53 high-profile papers published between 2010 and 2012. Of the 50 experiments the team managed to repeat, only about 40% of positive findings successfully replicated by most criteria. Perhaps more striking: the median effect size in the replications was 85% smaller than in the original experiments, and 92% of replications found a smaller effect than the original. In other words, even when a finding did replicate, it was usually much weaker than originally reported.

These results don’t mean the original research was fraudulent. Many factors contribute, including small sample sizes, selective reporting of results, and pressure to produce dramatic findings. But the pattern made clear that published results in several fields were less reliable than the scientific community had assumed.

Why Replication Is Harder Than It Sounds

One reason replication doesn’t happen as often as it should is that doing it well requires significant resources. A study that barely reached statistical significance the first time needs a much larger sample to have a good chance of replicating. Research in statistics has shown that if an original study’s result just crossed the conventional significance threshold, a replication would need roughly 16 times the original sample size to have an 80% chance of confirming the finding. Even for stronger original results, the replication typically needs 3.5 times the sample size or more.

This means replication studies are often more expensive and time-consuming than the original work. Funding agencies and universities rarely prioritize them, which creates a structural disincentive.

Registered Reports and Better Incentives

One of the most promising reforms is the registered reports model. In a traditional publishing process, researchers conduct a study and then submit the results. Journals decide whether to publish based partly on whether the findings are interesting or statistically significant. This creates a bias toward positive, surprising results and against null findings.

Registered reports flip this process. Researchers submit their question and methods before collecting data. The journal reviews the proposal and, if the question is important and the methods are sound, offers in-principle acceptance. The study then gets published regardless of what the results turn out to be. This removes the incentive to cherry-pick data or spin results, and it means that replication attempts with null findings actually see the light of day.

Because registered reports are reviewed for methodological rigor before the results exist, studies published through this model tend to be more reliable. That, in turn, makes them a stronger foundation for future research and future replication attempts.

Replication Across Different Fields

Replication looks different depending on the discipline. In a chemistry lab, you can closely control temperature, concentrations, and timing, so direct replication is relatively straightforward. In fields that study human behavior, like psychology or education, controlling every variable is much harder. Cultural differences, individual variation, and the passage of time all introduce noise.

In computational sciences, replication increasingly depends on sharing code and data so others can verify results. The FAIR principles (findable, accessible, interoperable, reusable) have become a widely adopted framework for making research materials available. If a climate model or a genomic analysis can’t be rerun by another team, it effectively can’t be replicated.

In fields like astronomy or geology, some observations simply can’t be repeated. You can’t re-observe a supernova that has already faded. In these cases, scientists rely more heavily on reproducibility (re-analyzing the same data with the same or improved methods) and on conceptual replication through independent observations of similar phenomena.