What Is Homoskedasticity and Why Does It Matter?

Homoskedasticity means that the random errors in a statistical model have the same variance across all observations. In a regression, it means your predictions are equally “off” whether you’re predicting small values or large ones. This property is one of the key assumptions behind ordinary least squares (OLS) regression, and when it holds, your standard errors, p-values, and confidence intervals are reliable.

The Core Idea

In any regression model, you’re fitting a line (or surface) through data points. The distance between each actual data point and the predicted value is called a residual. Homoskedasticity means the spread of those residuals stays roughly constant no matter where you look along the prediction line. The opposite, heteroskedasticity, means the spread changes, typically growing larger or smaller as the predicted value increases.

A classic example: predicting spending by income. People with low incomes spend almost all their money on necessities, so there’s little variation. People with high incomes have wildly different spending habits, some save aggressively, others spend freely. The variance of the residuals fans out as income rises. That’s heteroskedasticity. If you were instead predicting something where the scatter stayed uniform across all income levels, you’d have homoskedasticity.

Why It Matters for Your Results

When homoskedasticity holds, OLS regression is the best linear unbiased estimator, a result known as the Gauss-Markov theorem. “Best” here means no other linear, unbiased method can produce estimates with smaller variance. Your coefficient estimates are as precise as they can be, and the standard errors OLS computes are correct.

When the assumption breaks down, the coefficient estimates themselves are still unbiased, but the standard errors become unreliable. OLS computes standard errors that are too small when variance is unequal, which makes your t-statistics too large and your confidence intervals too narrow. In practice, this inflates your Type I error rate (false positives). One study found that under strong heteroskedasticity, the rejection rate climbed to 26% when it should have been 5%. Confidence intervals that nominally cover the true value 95% of the time dropped to roughly 75% coverage. You’d be drawing confident conclusions from data that doesn’t actually support them.

How to Spot It Visually

The simplest check is a residual plot: plot your model’s residuals on the vertical axis against the fitted (predicted) values on the horizontal axis. Under homoskedasticity, you’ll see a roughly even band of points scattered around zero with no obvious pattern. Under heteroskedasticity, you’ll typically see a funnel or fan shape where the residuals spread out (or compress) as fitted values increase. A cone that opens to the right is the most common pattern, but any systematic change in spread is a red flag.

Formal Statistical Tests

Visual inspection is a good first step, but formal tests give you a p-value to work with. The two most common are the Breusch-Pagan test and the White test. Both use homoskedasticity (constant variance) as the null hypothesis, so a low p-value means you have evidence of heteroskedasticity.

The Breusch-Pagan test checks whether the variance of residuals is linearly related to your predictor variables. It’s straightforward and works well when heteroskedasticity follows a linear pattern. The White test is more flexible because it also includes squared terms and interactions among predictors, so it can detect nonlinear forms of heteroskedasticity that Breusch-Pagan would miss. Both are available in standard software: Python’s statsmodels library provides het_breuschpagan and het_white functions, and R has equivalent packages.

A third option, the Goldfeld-Quandt test, splits your data into two groups and checks whether the variance differs between them. It’s simpler but less general.

What to Do When Variance Isn’t Constant

If you detect heteroskedasticity, you have several practical options depending on how severe it is and what you’re trying to accomplish.

Robust Standard Errors

The most common modern fix is to keep OLS but replace the standard errors with heteroskedasticity-consistent (HC) standard errors, sometimes called “robust” standard errors. This doesn’t change your coefficient estimates. It corrects the standard errors so your p-values and confidence intervals are valid even when variance isn’t constant. In many applied fields, this is the default approach.

Weighted Least Squares

Weighted least squares (WLS) addresses heteroskedasticity directly by giving each observation a weight inversely proportional to its error variance. Observations with large variance (less informative, more noisy) get downweighted, while observations with small variance (more precise) get upweighted. This produces more efficient estimates than OLS when the variance structure is correctly specified. The challenge is that you need to know, or estimate, how the variance changes across observations.

Data Transformations

Sometimes transforming your dependent variable can stabilize the variance. A log transformation is the most common choice, particularly when variance increases proportionally with the mean (as with income or price data). A square root transformation works for count data, and a reciprocal transformation (1 divided by the value) can help in other cases. These transformations compress larger values more than smaller ones, which tends to equalize the spread of residuals. The tradeoff is that your model now predicts the transformed variable, so interpreting coefficients requires back-transformation.

The right fix depends on context. Robust standard errors are the easiest and most widely used. WLS is worth the effort when you have a good model for the variance structure. Transformations work best when the data naturally lend themselves to a different scale.