What Is Considered Unusual in Statistics?

In statistics, a data point is generally considered unusual if it falls more than two standard deviations away from the mean. That threshold captures roughly 95% of all values in a normal distribution, so anything beyond it lands in the rarest 5%. But “unusual” is not a single fixed rule. Statisticians use several different methods and cutoffs depending on the context, the shape of the data, and how much is at stake.

The Standard Deviation Approach

The most common framework for defining unusual values is the empirical rule, which applies to data that follows a bell-shaped (normal) distribution. It breaks down like this:

Within 1 standard deviation of the mean: about 68% of all values
Within 2 standard deviations: about 95% of all values
Within 3 standard deviations: about 99.7% of all values

If a value sits more than two standard deviations from the mean, it’s in the outer 5% of the distribution. Most introductory statistics courses use this as the dividing line between “usual” and “unusual.” A value beyond three standard deviations is extremely rare, occurring less than 0.3% of the time, and is often flagged as a potential outlier.

To put this in concrete terms: if the average adult body temperature is 98.6°F with a standard deviation of 0.7°F, a reading of 100.0°F would be exactly two standard deviations above the mean. Anything higher than that would be statistically unusual.

Z-Scores: Measuring How Far Off a Value Is

A z-score translates any data point into the number of standard deviations it sits from the mean. A z-score of 0 means the value equals the mean. A z-score of 1.5 means it’s one and a half standard deviations above. A z-score of negative 2.4 means it’s 2.4 standard deviations below.

In practice, values with z-scores beyond +2 or -2 are typically labeled unusual, and values beyond +3 or -3 are considered extreme. However, NIST (the National Institute of Standards and Technology) notes that using standard z-scores can be misleading, especially with small datasets. That’s because both the mean and the standard deviation are themselves sensitive to outliers. One extreme value can inflate the standard deviation, which paradoxically makes that same value look less unusual than it really is.

For this reason, some statisticians prefer a modified z-score that uses the median instead of the mean and a measure called the median absolute deviation instead of the standard deviation. Researchers Boris Iglewicz and David Hoaglin recommend flagging any data point with a modified z-score above 3.5 as a potential outlier. This approach is more resistant to being distorted by the very values you’re trying to detect.

The IQR Method for Spotting Outliers

Not all data follows a bell curve. When it doesn’t, the standard deviation approach can give unreliable results. The interquartile range (IQR) method works regardless of the distribution’s shape and is the basis for the familiar box plot.

The IQR is the range between the 25th percentile (Q1) and the 75th percentile (Q3) of your data, capturing the middle 50% of values. A data point is considered an outlier if it falls more than 1.5 times the IQR below Q1 or above Q3. So if Q1 is 20 and Q3 is 40, the IQR is 20. Any value below -10 (that’s 20 minus 30) or above 70 (40 plus 30) would be flagged as unusual.

This 1.5 multiplier is the standard convention for identifying mild outliers. Some analysts use a multiplier of 3 to identify extreme outliers, values so far from the bulk of the data that they almost certainly represent errors, rare events, or a fundamentally different process.

P-Values and Statistical Significance

The concept of “unusual” also shows up in hypothesis testing, where the question shifts from “is this data point unusual?” to “is this result unusual enough that it probably didn’t happen by chance?” That’s where p-values come in.

A p-value measures the probability of seeing a result at least as extreme as yours if nothing interesting were actually going on (if the “null hypothesis” were true). The conventional threshold is 0.05, meaning there’s only a 5% chance the result is a fluke. Results below this cutoff are called “statistically significant.” Some fields use stricter thresholds of 0.01 (1%) or more lenient ones of 0.10 (10%), depending on the consequences of being wrong.

That said, the American Statistical Association issued a formal statement cautioning against treating 0.05 as a magic line. The statement warns that reducing scientific conclusions to rigid cutoffs like p < 0.05 “can lead to erroneous beliefs and poor decision making.” A p-value near 0.05, taken by itself, offers only weak evidence. Context matters: how the study was designed, how large the sample was, and whether the finding makes sense given everything else we know.

When the Stakes Are Higher, the Bar Goes Up

Different fields set different thresholds for “unusual” based on how costly a false claim would be. In most social science and medical research, p < 0.05 (roughly two standard deviations) is the standard. But particle physics demands five standard deviations, known as “five sigma.”

Five sigma corresponds to a probability of just 0.00006% that the result is a statistical fluke. This was the threshold CERN required before announcing the discovery of the Higgs boson in 2012. Physicists needed enough data for the signal to cross that line before they could claim they’d found a new particle rather than a random bump in the noise. Five sigma is considered the gold standard in that field precisely because it makes false discoveries vanishingly unlikely.

Why Unusual Values Matter

Outliers have an outsized effect on certain calculations. The mean and the range both include every value, so a single extreme number can drag the average far from where most of the data sits. The median, by contrast, is essentially immune to outliers because it only depends on the middle position, not on the actual values at the extremes. This is why real estate reports use median home prices rather than mean home prices: a handful of multimillion-dollar sales would make the average misleading for a typical buyer.

A quick diagnostic: if the mean and median of your data are far apart, and you’d expect the data to be roughly symmetric, there’s likely an outlier pulling the mean in one direction.

How Analysts Handle Unusual Values

Detecting an unusual value is only the first step. What you do with it depends on why it’s there. If a data entry reads 1145 when it should be 145, that’s a typo and should be corrected. If a value is genuinely extreme but real, removing it could erase important information.

Analysts generally choose from a few strategies. Trimming removes outliers entirely, which shrinks the dataset but eliminates their influence. Winsorization takes a different approach: instead of deleting extreme values, it replaces them with less extreme ones at a chosen boundary. In a 95% winsorization, for example, any value below the 5th percentile gets replaced with the 5th percentile value, and anything above the 95th percentile gets capped at that level. This keeps every data point in the set while limiting the pull of the extremes.

The choice of method, and even the choice of where to draw the line between “unusual” and “normal,” involves judgment. As researchers have noted, deciding whether to use a cutoff of 2, 2.5, or 3 standard deviations is inherently subjective. There is no single correct answer. The key is being transparent about which rule you used and why, so others can evaluate whether your conclusions hold up under a different choice.