How to Measure Spread: Range, IQR, and Standard Deviation

Spread in statistics refers to how far apart values in a dataset sit from each other and from the center. The most common measures are range, interquartile range, variance, standard deviation, and the coefficient of variation, each suited to different situations. Choosing the right one depends on your data, whether you’re working with a full population or a sample, and whether outliers are a concern.

Range: The Simplest Measure

The range is the difference between the largest and smallest values in your dataset. If the highest test score in a class is 98 and the lowest is 42, the range is 56. It takes seconds to calculate and gives you an immediate sense of how wide your data stretches.

The tradeoff is that the range is extremely sensitive to extreme values. A single unusually high or low data point will inflate it, making your data look more spread out than it really is. For that reason, the range works best as a quick supplement to other measures rather than your only tool.

Interquartile Range: Ignoring the Extremes

The interquartile range (IQR) solves the outlier problem by focusing on the middle 50% of your data. To find it, sort your values from smallest to largest and split them into four equal groups (quartiles). The lower quartile (Q1) is the median of the bottom half, and the upper quartile (Q3) is the median of the top half. The IQR is simply Q3 minus Q1.

Because it discards the top 25% and bottom 25% of values, the IQR stays stable even when your dataset includes extreme observations. If you’re comparing the typical spread of home prices in two neighborhoods and one neighborhood has a single mansion that sold for ten times the median, the IQR will give you a far more honest picture than the range.

The IQR also plays a key role in identifying outliers. A common rule flags any data point above Q3 + 1.5 × IQR, or below Q1 − 1.5 × IQR, as an outlier. This is the standard method used in box plots.

Variance and Standard Deviation

Variance and standard deviation are the workhorses of spread measurement. They account for every value in the dataset, not just the extremes or the middle, by measuring how far each data point sits from the average.

The process works like this: find the mean of your data, then subtract the mean from each value to get the deviation. Square each deviation (so negatives don’t cancel out the positives), then average those squared deviations. That average is the variance. Take its square root, and you have the standard deviation, which is back in the same units as your original data.

There is one important distinction. If your data represents an entire population (every student in a school, every transaction in a year), you divide by the total number of values. If your data is a sample drawn from a larger population, you divide by one less than the number of values. This adjustment corrects for the fact that samples tend to slightly underestimate the true variability of the population they came from.

Standard deviation is generally more intuitive than variance because it’s expressed in the same units as the data. If you’re measuring heights in centimeters, the standard deviation is also in centimeters, while the variance is in centimeters squared. A small standard deviation means values cluster tightly around the mean; a large one means they’re widely scattered.

Coefficient of Variation: Comparing Across Scales

Standard deviation works well when you’re comparing datasets measured in the same units and with similar averages. But what if you want to compare the spread of employee salaries (measured in dollars) with the spread of employee ages (measured in years)? The raw standard deviations aren’t comparable because the scales are completely different.

The coefficient of variation (CV) handles this by expressing the standard deviation as a proportion of the mean. You divide the standard deviation by the mean, often multiplying by 100 to get a percentage. A CV of 15% means the typical spread is 15% of the average value, regardless of what units you started with. This makes it possible to compare variability across datasets with different units or wildly different means.

Visualizing Spread With Box Plots

A box plot (sometimes called a box-and-whiskers plot) turns these numbers into a picture. It displays what statisticians call the five-number summary: the minimum, Q1, the median, Q3, and the maximum. The box spans from Q1 to Q3, representing the middle 50% of data. A line inside the box marks the median. The “whiskers” extend from the box to the smallest and largest non-outlier values, and any outliers appear as individual dots beyond the whiskers.

At a glance, you can see how tightly grouped the data are, whether the distribution is symmetric or skewed, and whether outliers exist. A tall, stretched box means high variability in the core of the data. A short, compact box means values are tightly clustered. When you place two or more box plots side by side, differences in spread and center become immediately obvious. If data within a group skew toward the higher range, the median line will sit closer to the bottom of the box, and the upper whisker will reach further.

Index of Dispersion: Detecting Patterns in Count Data

When you’re working with count data (number of customer complaints per day, number of accidents per intersection), the index of dispersion reveals whether events are happening randomly, in clusters, or in an unusually regular pattern. It’s calculated by dividing the variance by the mean.

If events occur randomly and independently, the variance and mean should be roughly equal, giving an index near 1. An index greater than 1 signals overdispersion, meaning events are clumped together more than randomness would predict. An index below 1 means events are more evenly spaced than expected. This is particularly useful in fields like ecology, quality control, and public health, where distinguishing random variation from meaningful clustering drives real decisions.

Choosing the Right Measure

Your choice depends on what your data looks like and what question you’re answering.

  • Range is useful for a quick snapshot, but don’t rely on it alone if outliers are possible.
  • Interquartile range is the better choice when your data contains extreme values or is skewed, because it focuses on the central bulk of observations.
  • Standard deviation is the most widely used measure and works well for roughly symmetric distributions. It’s the default in most statistical software and scientific reporting.
  • Coefficient of variation is essential when comparing spread across datasets with different units or scales.
  • Index of dispersion applies specifically to count data when you need to know whether observations are clustered, random, or evenly distributed.

In practice, you’ll often use more than one. Reporting the mean and standard deviation together gives a clear picture of center and spread for symmetric data. For skewed data, the median and IQR are a more honest pairing. And when presenting results to a non-technical audience, a box plot can communicate in seconds what a table of numbers takes minutes to explain.