Mahalanobis distance measures how far a data point is from the center of a distribution, adjusted for the shape and spread of that distribution. Think of it as a smarter version of regular distance that accounts for the fact that your data might be stretched, squeezed, or tilted across different dimensions. It was introduced by the Indian statistician P. C. Mahalanobis in 1936, and it remains one of the most widely used distance metrics in statistics and machine learning.
The Core Idea
If you’ve taken a statistics course, you’ve probably seen the z-score: take a value, subtract the mean, and divide by the standard deviation. That tells you how many standard deviations a point is from the average. Mahalanobis distance is the multivariate version of this same concept. Instead of working with one variable at a time, it works with many variables simultaneously, telling you how many “standard deviations” a point is from the center of a cloud of data in multiple dimensions.
What makes it powerful is that it doesn’t treat each variable independently. If two variables are correlated (say, height and weight tend to increase together), Mahalanobis distance knows that. A person who is tall and light might be more unusual than a person who is tall and heavy, even if both are the same straight-line distance from the average. Regular distance measurements miss this entirely.
One helpful way to visualize it: imagine your data forms an elongated, tilted ellipse rather than a perfect circle. The Mahalanobis distance of any point is its distance from the center of that ellipse, divided by the width of the ellipse in that point’s direction. Two points that are equally far from the center in regular distance could have very different Mahalanobis distances if one sits along the ellipse’s wide axis and the other sits along its narrow axis.
Why Regular Distance Falls Short
Euclidean distance, the straight-line distance you learn in geometry, treats every dimension equally. That works fine when all your variables are measured on the same scale and aren’t correlated with each other. In practice, that almost never happens. If you’re measuring something in millimeters and something else in kilograms, the millimeter variable will dominate the distance calculation simply because its numbers are larger, not because it matters more.
Mahalanobis distance is unitless and scale-invariant. It doesn’t matter whether you measure in inches or centimeters, dollars or cents. The covariance structure of the data normalizes everything automatically. This makes it especially useful when you’re combining variables measured in completely different units, or when your features are correlated with each other in ways that would distort a simple distance calculation.
How the Calculation Works
The calculation involves four steps, each building on the last. First, you compute the mean of your dataset across all variables. This gives you the “center” of the data cloud. Second, you compute the covariance matrix, which captures both how spread out each variable is and how each pair of variables moves together. The diagonal of this matrix holds the variance of each variable, while the off-diagonal entries hold the covariances between pairs.
Third, you invert that covariance matrix. This is the critical step. The inverse covariance matrix effectively “whitens” the data, transforming the tilted, stretched ellipse into a perfect sphere. Finally, for any point you want to evaluate, you subtract the mean, multiply by the inverse covariance matrix, and take the square root. The result is a single number representing how far that point is from the distribution’s center, measured in units that respect the distribution’s shape.
If the covariance matrix is poorly conditioned (which happens when variables are nearly redundant or when you have more variables than observations), the inversion step can be unstable. In those cases, adding a small value to the diagonal or using a pseudo-inverse can stabilize the calculation.
Detecting Outliers
One of the most common uses of Mahalanobis distance is identifying outliers in multivariate data. The logic is straightforward: if a data point has an unusually large Mahalanobis distance from the rest of the dataset, it’s unusual in ways that account for the full correlation structure of the data. This catches outliers that would be invisible if you checked each variable on its own.
When the data follows a multivariate normal distribution, the squared Mahalanobis distances follow a chi-squared distribution with degrees of freedom equal to the number of variables. This gives you a principled threshold for flagging outliers. You can convert each squared distance into a probability and check whether it’s larger than you’d expect by chance. Points that fall far to the upper right of a chi-squared quantile plot are the suspicious ones.
There’s one catch: the standard mean and covariance estimates used to calculate Mahalanobis distance are themselves sensitive to outliers. A few extreme points can distort the covariance matrix enough to mask their own unusualness. Robust estimation methods, like the minimum covariance determinant, compute the covariance from the most “central” subset of the data, making the resulting distances much more reliable for outlier detection.
Applications in Machine Learning
Mahalanobis distance sits at the heart of several classification algorithms. In linear discriminant analysis (LDA), the classifier assigns each new observation to whichever class has the closest mean in Mahalanobis distance terms, while also factoring in how common each class is. This means the algorithm naturally handles features that vary on different scales or that are correlated, without requiring you to manually standardize anything.
It also appears in nearest-neighbor classifiers, where using Mahalanobis distance instead of Euclidean distance can dramatically improve performance on datasets where features are correlated or differently scaled. Anomaly detection systems rely on it too: in manufacturing quality control, for instance, a product whose measurements have a high Mahalanobis distance from the “normal” production distribution gets flagged for inspection.
Applications in Finance
Financial analysts use Mahalanobis distance to spot anomalies in company performance and assess creditworthiness. One study of 254 manufacturing companies listed on the Istanbul Stock Exchange analyzed financial data over nearly a decade using the metric. The results classified 89.6% of observations as normal (low or moderate risk), while 10.4% were flagged as anomalies or high-risk. Because the distance accounts for correlations between financial indicators, it can catch companies whose combination of metrics is unusual even when no single metric looks alarming on its own.
Portfolio managers also use it for forecast evaluation, measuring how far actual asset returns deviate from predictions. Large Mahalanobis distances between predicted and realized returns signal that a model may be unreliable. And by analyzing the covariance structure between company indices, researchers have used the metric to automatically identify clusters of companies that behave as independent “submarkets,” revealing structure that simpler methods would miss.
When It Works Best, and When It Doesn’t
Mahalanobis distance assumes the underlying data is roughly multivariate normal, meaning the data cloud is shaped like an ellipsoid. When this holds, the metric has a clean statistical interpretation and the chi-squared threshold for outlier detection is valid. For many real-world datasets involving biological measurements, financial returns, or sensor readings, this assumption is reasonable enough to be useful.
When the data is highly non-normal (multimodal, heavily skewed, or with complex nonlinear relationships between variables), the single mean and covariance matrix can’t capture the true shape of the distribution, and Mahalanobis distance becomes less meaningful. In those cases, you might need to apply it within individual clusters rather than to the dataset as a whole, or switch to a different distance metric entirely. You also need enough observations relative to the number of variables. If you have 10 variables but only 15 data points, the covariance matrix estimate will be unreliable and the resulting distances won’t mean much.

