How to Remove Outliers: IQR, Z-Score, and More

Removing outliers starts with detecting them using a statistical method, then deciding whether each flagged point should be deleted, adjusted, or kept. The most common approach uses the interquartile range (IQR): calculate the spread of your middle 50% of data, multiply it by 1.5, and flag anything beyond that boundary. But detection is only half the job. How you handle those flagged points depends on your data, your goals, and whether the outlier represents a real observation or an error.

The IQR Method

This is the most widely used technique and the one behind the outlier dots you see on boxplots. First, sort your data and find Q1 (the 25th percentile) and Q3 (the 75th percentile). Subtract Q1 from Q3 to get the IQR. Then build a “fence” around your data: the lower fence is Q1 minus 1.5 times the IQR, and the upper fence is Q3 plus 1.5 times the IQR. Any value outside those fences is flagged as an outlier.

For example, if Q1 is 20 and Q3 is 30, the IQR is 10. Multiply by 1.5 to get 15. Your lower fence is 5 and your upper fence is 45. A data point of 50 would be flagged; a data point of 42 would not. The CDC’s data visualization tools use this exact “IQR 1.5” rule by default when generating boxplots, and it’s the standard in most statistical software.

You can also use a stricter multiplier of 3.0 instead of 1.5 to identify only extreme outliers. This is useful when you want to be conservative and remove only the most dramatic deviations rather than trimming moderate ones.

The Z-Score Method

If your data follows a roughly normal (bell-shaped) distribution, the Z-score approach works well. A Z-score tells you how many standard deviations a data point sits from the mean. The standard threshold is 3: any point with a Z-score above 3 or below -3 is considered a potential outlier. This makes intuitive sense because 99.7% of normally distributed data falls within three standard deviations of the mean, so anything beyond that range is genuinely unusual.

The catch is that the Z-score method has a serious weakness with small datasets. Because the mean and standard deviation are themselves pulled toward extreme values, the method will never detect an outlier in a dataset with fewer than 12 items. If your dataset is small or your data is skewed, you need a different tool.

Modified Z-Score for Skewed or Small Data

The modified Z-score swaps out the mean for the median and replaces the standard deviation with the median absolute deviation (MAD). Both of these are “robust” measures, meaning a few extreme values won’t distort them the way they distort the mean. This makes the modified Z-score a better choice when your data isn’t normally distributed, when it’s skewed, or when you’re working with a small number of observations. A common threshold is 3.5: any modified Z-score above that flags the point for review.

Detecting Outliers in Regression Models

When you’re fitting a regression line, a single unusual point can drag the entire model in its direction. Cook’s Distance measures how much each data point influences your regression results. The commonly used guidelines: a Cook’s Distance greater than 0.5 warrants a closer look, and a value greater than 1 means the point is quite likely influential. Another practical rule is to simply look for values that visually stand out from the rest of the Cook’s Distance values. If one point towers over the others, it’s almost certainly pulling your model.

For datasets with multiple variables, the Mahalanobis distance measures how far a point is from the center of all your data simultaneously, accounting for correlations between variables. The standard version uses the sample mean and covariance matrix, but these are themselves sensitive to outliers. A robust variant that uses more resistant estimates of center and spread tends to perform better in practice.

Visual Detection With Plots

Before running any formula, plot your data. Boxplots automatically display outliers as individual dots beyond the whiskers, using the IQR 1.5 rule. Scatter plots reveal points that sit far from the cluster of your data or far from a trend line. Histograms show values in the tails that are clearly separated from the rest of the distribution. These visual tools often catch problems that pure formulas miss, like a cluster of unusual values that are technically within range but clearly belong to a different pattern.

Delete, Cap, or Keep

Once you’ve flagged outliers, you have three main options, and deletion is often the worst one.

Deletion (trimming) removes the flagged points entirely. This is appropriate when you can identify a clear reason the point is invalid: a data entry error, a sensor malfunction, or a measurement from a subject who didn’t meet your study criteria. Trimming without a reason risks throwing away real, meaningful data. Any statistical method for flagging outliers will inevitably flag some valid points, so automatic deletion can bias your results.
Winsorization replaces extreme values with the nearest non-outlier value instead of deleting them. If your upper fence is 45 and you have a data point at 60, Winsorization would change that 60 to 45. This preserves the data point’s existence and direction while limiting its pull on your statistics. It’s a good choice when you suspect the extreme value is real but don’t want it to dominate your analysis.
Transformation applies a mathematical function to your entire dataset (like taking the log or square root of every value) to compress the range and pull extreme values closer to the rest. This works well when your data is naturally right-skewed, as many biological and financial datasets are.

The general principle: only trim if you believe the case doesn’t belong to the population you’re trying to study. If you can’t identify why it’s an outlier, consider capping or transforming rather than deleting.

Choosing the Right Method for Your Data

The IQR method works in the widest range of situations because it doesn’t assume your data is normally distributed. Use it as your default for general-purpose data cleaning. The Z-score method is appropriate when you have a reasonably large dataset (at least 12 points, ideally more) and your data roughly follows a bell curve. Switch to the modified Z-score when those conditions aren’t met. For regression analysis, Cook’s Distance is purpose-built to find points that distort your model.

In specialized fields, pre-defined thresholds often override general statistical rules. Clinical laboratories, for instance, use the central 95th percentile of a healthy population to set reference intervals, with exceptions for specific tests that use the 99th percentile or fixed decision limits. The point is that outlier thresholds aren’t universal. What counts as “too extreme” depends on your domain and what decisions hinge on the data.

Documenting What You Removed

Whatever method you choose, record every step. Save your raw data before any cleaning. Document which method you used, what threshold you applied, how many points were flagged, and what you did with each one (deleted, capped, or kept). Mark the date of each modification. This isn’t just good practice for academic research. It protects your own work: if your results look surprising later, you can trace back exactly what happened and verify that your cleaning process didn’t create the result. Transparency and reproducibility depend entirely on keeping a clear trail from raw data to final analysis.