What Is Measurement Systems Analysis

Measurement systems analysis (MSA) is a structured method for determining how much of the variation you see in your data actually comes from the measurement process itself, rather than from real differences between the items being measured. Every measurement contains some error. MSA quantifies that error and tells you whether your measurement system is reliable enough for the decisions you’re making with the data. It’s a foundational tool in quality management, used heavily in manufacturing but applicable anywhere measurements drive decisions.

Why Measurement Error Matters

When you measure parts coming off a production line, inspect products for defects, or track a process over time, the numbers you record reflect two things blended together: the actual variation between items and the error introduced by your measurement process. If your measurement system contributes too much variation, you can’t tell good parts from bad ones. You might reject parts that are actually fine, accept parts that are out of spec, or conclude a process is unstable when it’s perfectly capable.

MSA separates these two sources of variation so you can see how much of your total observed variation comes from the parts themselves and how much comes from the act of measuring. Total process variation equals part-to-part variation plus measurement system variation. The goal is to make the measurement system’s contribution as small as possible relative to the real differences you’re trying to detect.

The Five Properties MSA Evaluates

A full measurement systems analysis looks at five properties, which fall into two categories: accuracy (how close your measurements land to the true value) and precision (how consistently they land in the same spot).

Accuracy Properties

Bias is the difference between the true value of a part and the average of repeated measurements of that part. If a steel rod is exactly 100 mm long but your caliper consistently reads 100.3 mm on average, you have 0.3 mm of bias. Think of it as the measurement system consistently aiming too high or too low.

Linearity describes whether the measurement error stays consistent across the range of values you measure. A system might be accurate when measuring small parts but increasingly inaccurate for larger ones. If a 100 cm object has 1 cm of error but a 150 cm object has 5 cm of error using the same instrument, the system is not linear. Linearity problems mean you can’t trust a single correction factor across your entire measurement range.

Stability is the ability of a measurement system to produce the same results when measuring the same item over time. An instrument that reads accurately on Monday but drifts by Friday has a stability problem. This is often tracked with control charts that monitor a reference standard at regular intervals.

Precision Properties

Repeatability captures variation when the same operator measures the same part multiple times using the same instrument. If one person measures the same feature on the same part ten times and gets slightly different readings each time, that spread is repeatability error. It reflects the instrument’s inherent variation and the operator’s consistency in using it.

Reproducibility captures variation when different operators measure the same part with the same instrument. If three inspectors each measure the same part and their averages differ, that difference is reproducibility error. It reflects differences in technique, training, or interpretation between people.

Gage R&R: The Core Study

The most common MSA technique is a Gage R&R study, which combines repeatability and reproducibility into a single assessment of your measurement system’s precision. The “gage” refers to whatever instrument or method you’re evaluating.

In a typical crossed Gage R&R study, multiple operators each measure the same set of parts multiple times, and the measurements are randomized so operators don’t know which part they’re remeasuring. The analysis then uses statistical methods to break down the total variation into its components: how much comes from part-to-part differences, how much from the gage itself (repeatability), and how much from differences between operators (reproducibility). Some studies expand on this by adding multiple instruments. For instance, five parts measured by three operators using three randomly selected gages, with each combination measured twice, produces 90 total measurements.

The results are typically expressed as a percentage of total variation or as a percentage of the tolerance (the acceptable range for the characteristic being measured). The acceptance criteria are straightforward:

Below 10% of total variation: the measurement system is considered good
Between 10% and 30%: may be acceptable depending on the application, cost of the measurement device, and cost of rework
Above 30%: not acceptable, and the measurement system needs improvement

Another metric reported alongside these percentages is the number of distinct categories (ndc), which tells you how many groups within your data the measurement system can reliably distinguish. A measurement system needs to produce at least 5 distinct categories to be considered capable. Fewer than 2 means the system essentially can’t tell parts apart at all.

MSA for Non-Numerical Measurements

Not all measurements produce numbers. Visual inspections, pass/fail decisions, and severity ratings are all measurement systems too, and they need evaluation. For these attribute measurements, MSA uses agreement analysis instead of Gage R&R.

The key statistic here is kappa, which measures the degree of agreement between raters beyond what you’d expect from chance alone. Kappa values range from negative 1 to positive 1, where 1 means perfect agreement and 0 means agreement no better than random chance. The Automotive Industry Action Group (AIAG) recommends a kappa value of at least 0.75 to indicate good agreement. When two raters are involved, Cohen’s kappa is used. For more than two raters, Fleiss’s kappa generalizes the same concept. When ratings have a natural order (like defect severity on a scale of 1 to 5), Kendall’s coefficients are more appropriate because they account for how close the disagreements are, not just whether raters agreed or disagreed.

How MSA Differs From Calibration

Calibration checks whether an instrument reads correctly against a known standard and adjusts it if needed. It addresses bias at a single point in time. MSA goes much further. It evaluates the entire measurement system, including the instrument, the people using it, the procedure they follow, and the environment they work in. A perfectly calibrated instrument can still produce unreliable data if operators use inconsistent techniques or if the measurement procedure is poorly defined. Calibration is a prerequisite for MSA, not a substitute for it.

The AIAG Standard

The global reference for MSA methodology is the AIAG Measurement Systems Analysis manual, now in its 4th edition. Published by the Automotive Industry Action Group, it’s one of five “core tool” manuals (alongside APQP, Control Plan, PPAP, FMEA, and SPC) that form the backbone of quality management in the automotive supply chain. While it originated in automotive manufacturing, the methods apply broadly. Any industry that makes decisions based on measured data benefits from understanding how much of that data is real and how much is noise from the measurement process itself.

What MSA Reveals in Practice

The real value of MSA shows up in what you do with the results. When repeatability dominates the measurement error, the problem is usually the instrument itself: it may lack sufficient resolution, need maintenance, or be poorly suited for the application. When reproducibility dominates, the problem is typically with the people or the procedure. Operators may need better training, the measurement method may need clearer instructions, or a fixture may be needed to reduce variation in how parts are positioned during measurement.

Sometimes MSA reveals that the measurement system is consuming so much of the tolerance that the process looks incapable even when it’s performing well. A process that appears to have a capability index below target may actually be perfectly fine once you account for the measurement noise inflating the variation. Fixing the measurement system in these cases is cheaper and faster than trying to improve a process that didn’t need improving.