An outlier is a data point that sits an abnormal distance from the other values in a dataset. If you measured the heights of 20 people in a room and one person was 7 feet tall, that measurement would likely be an outlier. The concept comes up constantly in statistics, science, business, and everyday life whenever you’re looking at a collection of numbers and one (or a few) just don’t fit the pattern.
How Outliers Work in Practice
Think of outliers as the values that make you pause and ask, “Is that right?” In a formal statistical sense, an outlier is any observation that deviates so much from the rest that it appears to have been generated by a different process entirely. Sometimes it was: a typo in data entry, a broken sensor, or a mistake in measurement. But sometimes the outlier is completely real, representing a genuinely rare event or an unusual individual.
That distinction matters. A data entry error that records someone’s age as 999 is an outlier you’d want to fix. But a patient in a clinical trial who responds dramatically better than everyone else might represent a genuine biological difference worth investigating. Researchers at the National Institutes of Health note that characterizing what counts as “normal” for a specific experiment is a critical step before you can even identify which points qualify as outliers.
How Outliers Are Identified
There are two common methods for flagging outliers, and both rely on measuring how far a data point falls from the center of the data.
The IQR Method
The most widely used approach is the interquartile range (IQR) method. It works by finding the middle 50% of your data, then drawing “fences” beyond which any value counts as an outlier. Here’s how it works: first, you find the values at the 25th percentile (Q1) and the 75th percentile (Q3). The IQR is the gap between them. Then you multiply that gap by 1.5 and subtract it from Q1 to get the lower fence, and add it to Q3 to get the upper fence. Anything outside those fences is an outlier.
A quick example: if the 25th percentile of exam scores is 80 and the 75th percentile is 90, the IQR is 10. Multiply by 1.5 to get 15. The lower fence is 65 and the upper fence is 105. Any score below 65 or above 105 would be flagged. NIST takes this a step further, classifying points beyond 1.5 times the IQR as “mild” outliers and points beyond 3 times the IQR as “extreme” outliers.
The Z-Score Method
The other common approach measures how many standard deviations a value sits from the average. A data point 3 or more standard deviations from the mean is typically considered an outlier, a threshold used in many clinical and scientific settings. This method has a weakness, though: in small datasets, the maximum possible z-score is mathematically limited, which means it can miss outliers entirely. Statisticians Boris Iglewicz and David Hoaglin recommend a modified version that uses the median instead of the mean, flagging any value with a modified score above 3.5.
Why Outliers Matter So Much
A single outlier can dramatically distort your understanding of a dataset, especially if you’re relying on averages. The mean is highly sensitive to extreme values. Consider a group of people whose ages cluster around 30 to 50. Add one person aged 99, and the mean jumps to 60, which doesn’t represent anyone in the group accurately. Swap that 99-year-old for a 10-year-old, and the mean drops to 19. Neither average reflects reality.
The median, by contrast, barely moves when outliers are present, which is why it’s the preferred measure of center for skewed data like home prices or income. Standard deviation, the most common measure of how spread out data is, also inflates significantly when outliers are included. This ripple effect means that outliers don’t just affect one number: they can distort averages, inflate measures of variability, and throw off the results of statistical tests all at once.
Spotting Outliers Visually
Box-and-whisker plots are the go-to visual tool for spotting outliers. The box represents the middle 50% of the data (from Q1 to Q3), and the “whiskers” extend to the most extreme values that still fall within the 1.5 IQR fences. Any points beyond the whiskers are plotted individually as dots, making outliers immediately visible. The CDC recommends this chart type specifically because it makes the presence and location of outliers obvious at a glance. Scatter plots serve a similar purpose when you’re looking at the relationship between two variables, since outlying points tend to sit far from the cluster.
What Causes Outliers
Outliers generally fall into a few categories. Measurement errors, such as a miscalibrated instrument or a data entry mistake, are the most straightforward. These are usually safe to correct or remove once you’ve confirmed the error. Sampling problems can also produce outliers: if you’re studying college students’ study habits and accidentally include a professor, that data point will look unusual because it came from the wrong population.
Natural variation is the trickiest source. Some outliers are simply rare but real. A 2022 paper in Integrative and Comparative Biology argued that biology has a habit of dismissing genuine outliers as errors, which can cause researchers to miss important processes like the rare individuals driving evolution in a population. The authors noted that ignoring real data simply because it’s unusual “threatens to limit our interpretation of important biological processes.”
How Analysts Handle Outliers
Once you’ve identified an outlier, you have several options, and choosing the right one depends on why the outlier exists.
- Removal (trimming): You delete the outlier entirely. This makes sense for confirmed errors but introduces bias if the value is legitimate.
- Replacement (Winsorization): You replace the outlier with the nearest non-outlier value. This keeps your sample size intact but still alters the original data.
- Keeping it: If the outlier is real and relevant, leaving it in is the most honest approach. You might use statistical methods that are resistant to extreme values, like reporting the median instead of the mean.
- Investigating it: Sometimes the outlier is the most interesting part of the data. In medicine, a patient who responds unexpectedly well to a treatment might reveal something about how the treatment works.
Both trimming and Winsorization can improve the accuracy of your estimates, but they can also introduce bias, particularly if the outlier thresholds aren’t set carefully. In clinical trials, the NIH expects researchers to establish their exclusion criteria before the study begins, not after they’ve already seen the data, to prevent cherry-picking results.
Outliers in Regression and Prediction
When you’re building a model to predict one variable from another, outliers take on additional significance because a single unusual point can tilt the entire prediction line. Statisticians use a measure called Cook’s distance to quantify this effect. It combines two factors: how far a data point’s predicted value is from its actual value, and how much leverage the point has based on its position along the x-axis. A point with high Cook’s distance means that removing it would substantially change the model’s predictions. In practice, this helps analysts decide which data points deserve closer inspection rather than relying on a gut feeling about what looks “off.”
Outliers Beyond Statistics
The concept extends well beyond math. Malcolm Gladwell’s 2008 book “Outliers” popularized the term to describe people whose success falls far outside normal experience, arguing that extraordinary achievement results from hidden advantages, timing, and cultural factors rather than pure individual talent. In business, an outlier might be a quarter with unusually high sales that skews annual projections. In medicine, it might be a patient who recovers far faster than expected. In each case, the core question is the same: is this unusual data point noise to be ignored, or a signal worth understanding?

