What Is the Level of Measurement for Change in Health?

Change in health is most commonly measured at the ordinal level, meaning you can rank people’s health as better, the same, or worse, but you can’t say exactly how much better or worse with equal precision across the scale. This applies to the majority of health questionnaires, symptom scales, and patient-reported outcome measures used in clinical practice and research. However, the exact level of measurement depends on the specific tool being used, and some health metrics do reach higher levels of measurement.

Why Most Health Change Measures Are Ordinal

When researchers or clinicians track changes in health, they typically rely on scales that ask patients to rate how they feel. Think of a pain scale from 1 to 10, a satisfaction survey ranging from “much worse” to “much better,” or a quality-of-life questionnaire with Likert-style responses. These tools produce ordinal data: the responses have a clear ranking, but the gap between each point on the scale isn’t necessarily equal. The difference between a pain rating of 2 and 4 may not represent the same real-world change as the difference between 6 and 8.

This matters because ordinal data limits what you can do mathematically. Strictly speaking, you can’t subtract a “before” score from an “after” score and treat the result as a precise quantity. A patient who improves from stage III cancer to stage II has clearly gotten better, but that improvement isn’t the same measurable distance as going from stage II to stage I. The same logic applies to most health rating scales. As one research group put it, “an ordinal scale will not become an interval scale simply because of its popularity or by adding individual Likert-scale items scores together.”

When Health Change Reaches Higher Levels

Not every health measurement is ordinal. Some physiological metrics operate at the ratio level, the highest level of measurement. These have a true zero point and equal intervals between values. Weight in kilograms, blood pressure in mmHg, duration of illness in days, height in centimeters, and specific lab values like liver enzyme levels all qualify. If you measure someone’s blood pressure before and after treatment, the change score is a true interval or ratio measurement. A drop of 10 mmHg means the same thing whether it’s from 160 to 150 or from 130 to 120.

Visual analog scales (VAS), where patients mark a point on a continuous line to indicate their pain or health status, occupy an interesting middle ground. Some researchers have argued these are merely ordinal, but at least five separate studies have demonstrated that VAS scales for pain intensity show ratio-level properties, meaning they have a true zero point and the distances between markings are proportionally meaningful. This makes VAS data more mathematically flexible than a numbered rating scale, though the debate isn’t fully settled for every type of VAS application.

The Four Levels of Measurement in Context

To understand where health change fits, it helps to see all four levels side by side.

Nominal: Categories with no ranking. In health, this includes things like blood type, whether a symptom is present (yes/no), or surgical outcome (alive/dead). You can’t measure change meaningfully at this level because there’s no direction.
Ordinal: Ranked categories where the spacing between ranks isn’t guaranteed to be equal. Cancer staging, pain scales, Likert-style health surveys, and BMI-based nutritional categories (severely thin, thin, normal, overweight, obese) all fall here. Most “change in health” measures live at this level.
Interval: Equal spacing between values, but no true zero. Temperature in Celsius is the classic example. Few health measures naturally land here, though statistical techniques can convert ordinal health scores to interval-level data.
Ratio: Equal spacing plus a true zero. Height, weight, age, lab values, and duration of illness all qualify. Change scores calculated from these metrics are fully valid arithmetic operations.

The Problem With Change Scores on Ordinal Scales

Here’s where this gets practical. If a patient rates their health as 3 out of 5 before treatment and 5 out of 5 after, it’s tempting to say they improved by 2 points. But because ordinal scales don’t have guaranteed equal intervals, that “2-point improvement” isn’t a precise measurement. The jump from 3 to 4 might represent a much larger real-world change in wellbeing than the jump from 4 to 5.

Researchers handle this limitation in a few ways. One approach is to use a technique called Rasch analysis, which applies a mathematical model to convert ordinal scores into interval-level data. This produces conversion tables that rescale the raw scores so the distances between them become truly equal, making it valid to calculate averages and change scores. Several widely used health questionnaires, including versions of the World Health Organization’s quality-of-life measure, have published these conversion tables specifically so researchers can analyze change more precisely.

Another practical approach is the concept of a minimal clinically important difference (MCID). First described in 1989, this is the smallest change score that patients actually perceive as meaningful and that would justify changing their treatment. The MCID acknowledges that statistical significance and real-world significance aren’t the same thing. A health questionnaire might detect a statistically significant half-point shift, but if patients can’t feel the difference, the change doesn’t matter clinically. MCIDs are defined based on patient self-reports, not clinical measurements, which keeps the focus on what the person actually experiences.

Which Statistical Tests Work for Each Level

If you’re analyzing health change data, the level of measurement determines which statistical tools are appropriate. For ordinal data like Likert-scale health ratings, nonparametric tests are the traditional choice. These methods work with the ranks of data rather than the raw values. The Wilcoxon signed rank test compares two measurements from the same group (like before and after treatment), while the Mann-Whitney U test compares two independent groups. For more than two groups, the Kruskal-Wallis test and Friedman test serve as alternatives to standard ANOVA. Spearman rank correlation measures the relationship between two ordinal variables.

That said, one of the more notable findings in this area is that parametric tests (the kind designed for interval and ratio data) actually perform well with ordinal health data too. A comprehensive review by Geoff Norman, a leading figure in medical education research methodology, demonstrated with both real and simulated data that parametric tests are robust enough to handle Likert-scale responses reliably. The practical consensus among many researchers is that parametric tests can be used to analyze ordinal health data, though you should still describe that data carefully. Reporting a mean pain score to the hundredths place, for example, implies a false level of precision when the underlying data are ordinal.

How Health Systems Use These Measurements

Patient-reported outcome measures (PROMs) are now collected routinely across healthcare systems, and how they’re analyzed depends partly on recognizing their level of measurement. At the individual level, a patient’s self-reported change scores help clinicians identify emerging concerns and adjust treatment plans. When aggregated across a hospital or clinic, the same ordinal data can reveal patterns in how groups of patients respond to specific programs, flag gaps in care, or help triage patients based on symptom severity. At the broadest level, health systems compare outcomes across providers and regions to guide policy decisions and resource allocation.

At every level, the core measurement challenge remains the same: most of this data is ordinal, and drawing meaningful conclusions about change requires understanding what that means. You can confidently say patients got better or worse and by how many scale points, but the precision of that number depends entirely on the tool used to measure it.