Repeatability vs. Reproducibility: How They Differ

Repeatability measures how consistent your results are when everything stays the same. Reproducibility measures how consistent they are when key conditions change, like the operator, equipment, location, or time. Both are types of measurement precision, but they answer different questions: repeatability asks “Can I get the same result twice?” while reproducibility asks “Can someone else get the same result?”

What Stays the Same and What Changes

Repeatability is precision under the tightest possible conditions. The same person uses the same instrument, follows the same procedure, works in the same location, and repeats the measurement within a short time window. The temperature, humidity, and other environmental factors stay roughly constant. If you ran a blood test three times in a row on the same analyzer with the same reagents, the spread of those three results reflects repeatability.

Reproducibility loosens those constraints deliberately. The measurement procedure stays the same, but nearly everything else can change: different operators, different instruments, different laboratories, different batches of supplies, and longer gaps between measurements (days, months, or even seasons). A reproducibility test might involve five labs in five cities all running the same assay on identical samples. The spread across all those results reflects reproducibility.

Because reproducibility introduces more sources of variation, it almost always produces a wider spread of results than repeatability does. That wider spread is expected and informative. It tells you how robust a measurement method really is once you strip away the controlled conditions of a single lab bench.

How They Fit Into Precision and Accuracy

Precision and accuracy are often confused, but they describe different things. Accuracy is how close a measurement lands to the true value. Precision is how close repeated measurements land to each other, regardless of whether they’re near the true value. You can be precise but inaccurate (all your shots hit the same wrong spot) or accurate but imprecise (shots scatter around the bullseye without clustering).

Repeatability and reproducibility are both subcategories of precision. The ISO standard defines repeatability as “precision under repeatability conditions” and reproducibility as “precision under reproducibility conditions.” NIST uses the same framework. When someone simply says a method has “good precision,” they often mean repeatability alone, but a complete picture requires both. A method with tight repeatability but poor reproducibility works fine in one lab but falls apart when transferred elsewhere.

Intermediate Precision: The Middle Ground

Clinical and industrial labs often work with a third category called intermediate precision that sits between the two extremes. Under intermediate precision conditions, the laboratory stays the same, but measurements happen across different days, sometimes with different operators or different instruments within that lab. This is what most quality-control programs actually track, because pure repeatability (running everything back to back in one sitting) doesn’t reflect the day-to-day reality of a working lab.

A hospital lab, for instance, might run control samples every day for a month on two analyzers staffed by rotating technicians. The variation captured across those runs is intermediate precision. If that same hospital then compares its results against four other hospitals in the same healthcare network, the combined variation across all five sites reflects reproducibility. Even when each individual lab performs well on its own, pooling the data can reveal that one lab measures a given value differently from the others.

Gauge R&R: How Manufacturing Uses Both

In manufacturing and quality control, a formal study called a Gauge R&R (Gage Repeatability and Reproducibility) quantifies both sources of variation at once. A typical setup works like this: several operators each measure the same set of parts multiple times using the same instrument. The variation within each operator’s repeated measurements is the repeatability component. The variation between different operators is the reproducibility component.

The goal is to figure out how much of the total measurement variation comes from the gauge itself (repeatability) versus the people using it (reproducibility). If the combined R&R variation is small relative to the tolerance range of the part, the measurement system is considered acceptable. If it’s large, engineers need to determine whether the problem is the instrument, the operators, or both before they can trust the data coming off the production line.

Why Reproducibility Matters in Science

The so-called “reproducibility crisis” in science is essentially a large-scale failure of reproducibility conditions. A meta-analysis of preclinical biomedical research estimated that only about 50% of published results could be successfully reproduced, and that this failure was costing roughly $28 billion per year in the United States alone on research that ultimately led nowhere.

Several factors drive this. Publication bias favors positive, novel findings over confirmations or null results, which means the published literature is skewed toward results that are more likely to be statistical flukes. The common threshold for statistical significance (a p-value below 0.05) compounds the problem. As Malcolm Macleod of Edinburgh University has explained, a study that barely clears that threshold has only about a 50% chance of producing a significant result again on replication, even if the underlying effect is real. Many so-called “failed replications” may simply be false negatives.

Perhaps the most practical issue is that published papers often don’t describe their methods in enough detail for another team to recreate the original setup. The Reproducibility Project in cancer biology found this to be one of its most consistent obstacles. Without knowing the exact environmental conditions, reagent batches, or animal strains used, a second lab is essentially guessing at the original conditions, which makes failure more likely and harder to interpret.

Reproducibility vs. Replicability

These two terms sound interchangeable, but a 2019 report from the National Academies of Sciences drew a clear line between them. Reproducibility means taking the original data and code, rerunning the same analysis, and getting the same results. It’s a computational check: given identical inputs, does the same output appear? Replicability means collecting entirely new data to test whether a previous finding holds up. It’s a scientific check: does the same phenomenon occur when someone starts from scratch?

This distinction matters because a study can be perfectly reproducible (anyone who reruns the analysis gets the same numbers) yet fail to replicate (a new experiment with fresh data doesn’t find the same effect). Reproducibility confirms that the math was done correctly. Replicability confirms that the finding reflects something real about the world. Both are necessary, but they test different things.

Practical Takeaways

If you’re evaluating a measurement system, a lab result, or a scientific claim, the key question is which conditions were held constant and which were allowed to vary. High repeatability with unknown reproducibility means the result works under controlled conditions but hasn’t been stress-tested. High reproducibility means the result holds up across different people, places, and times, which is a much stronger statement.

In everyday terms: repeatability is whether you can bake the same cake twice in your own kitchen. Reproducibility is whether someone else can bake that cake in their kitchen, using their oven and their measuring cups, and end up with something recognizably the same. Both matter, but if you had to pick one as the stronger test of a recipe, it’s the second.