What Does Heteroscedasticity Look Like in Regression?

Heteroscedasticity shows up as a fan or cone shape in a residual plot, where the spread of data points widens (or narrows) as you move from left to right. Instead of a uniform band of points scattered evenly around zero, you see the vertical range of residuals growing larger at one end. It’s one of the easiest regression problems to spot visually once you know what to look for.

The Classic Fan Shape

The most common way to check for heteroscedasticity is with a residuals versus fitted values plot. This is a scatter plot with your model’s predicted values on the x-axis and the residuals (the difference between each actual value and the predicted value) on the y-axis. In a well-behaved model, those residuals form a roughly horizontal band centered on zero, with points bouncing randomly above and below the line at a consistent spread.

When heteroscedasticity is present, that band isn’t consistent. The most typical pattern looks like a cone or megaphone opening to the right: residuals near the left side of the plot cluster tightly around zero, while residuals on the right side spread out dramatically. The vertical range of the dots keeps increasing as the fitted values increase. Think of it like a triangle lying on its side, with the narrow point at the left and the wide end at the right.

This pattern means your model’s errors are small and predictable for low values but large and erratic for high values. The variance of the errors isn’t constant; it grows along with whatever you’re measuring.

Why Income and Spending Are a Perfect Example

Heteroscedasticity isn’t just a statistical abstraction. It shows up naturally in real data, and the classic example is household spending. If you plot household income against leisure spending (movies, vacations, skiing), low-income households cluster tightly together because there’s simply less room in their budget for variation. A family earning $25,000 a year has limited options for how much they spend on entertainment.

High-income households, on the other hand, spread out wildly. One family earning $200,000 might spend modestly on leisure while another at the same income level spends lavishly. The result, when you plot it, is that unmistakable cone shape: tight on the left, fanning out on the right. The error in predicting spending grows proportionally with income because wealthier households simply have more room to vary.

Less Common Patterns

The fan shape isn’t the only form heteroscedasticity takes. Sometimes the residual plot looks like a bow tie or hourglass, where the spread is wide at both ends of the x-axis but narrow in the middle. This happens when the variance of your outcome variable is large for extreme values of your predictor but small near the center of the distribution.

The reverse pattern also exists: residuals that are tightly clustered at both extremes but spread wide in the middle. Which pattern you see depends on the relationship between your predictor variable and the underlying variance of your data. The fan shape is by far the most common in practice, but any residual plot where the vertical spread changes systematically (rather than staying roughly constant) signals heteroscedasticity.

What Homoscedasticity Looks Like for Comparison

It helps to know what “normal” looks like. In a homoscedastic residual plot, the dots form a shapeless cloud with roughly equal vertical spread everywhere. At every value along the x-axis, the y-values of the dots have about the same variance. There’s no widening, no narrowing, no pattern at all. It looks like someone sprayed points randomly within a horizontal band. That’s what you want to see. Any systematic change in the width of that band, whether it widens, narrows, or pulses, is a sign that the constant-variance assumption is violated.

How to Confirm What You’re Seeing

Visual inspection is the first step, but your eyes can sometimes see patterns that aren’t really there, especially in small datasets. Formal statistical tests can confirm whether the uneven spread is significant. The most widely used is the Breusch-Pagan test, which starts from the assumption that variance is constant (that’s the null hypothesis) and tests whether the data provide evidence against it. A low p-value means the uneven spread you’re seeing in the plot is real and not just noise.

For most practical purposes, though, the residual plot is your best diagnostic tool. Penn State’s statistics program calls the residuals versus fits plot “the most frequently created plot” in residual analysis, and it’s designed to detect exactly three things: non-linearity, outliers, and unequal error variances. If you see a fan, you’ve found the third one.

Why It Matters for Your Results

Heteroscedasticity doesn’t bias your regression coefficients. Your estimated slopes and intercepts are still pointed at the right answer. The problem is with everything around those estimates: the standard errors become unreliable, which means your confidence intervals and p-values are wrong. You might conclude a relationship is statistically significant when it isn’t, or miss a real effect because the standard errors are inflated. The regression line itself is fine; the measure of how confident you should be in that line is not.

Common Fixes

If your residual plot shows that classic cone shape, you have a few practical options. The simplest is a log transformation of your outcome variable. Taking the logarithm compresses large values more than small ones, which often stabilizes the variance and turns that fan shape into a uniform band. This works especially well when variance grows proportionally with the predicted value, which is the most common real-world pattern.

If a log transformation doesn’t solve the problem, weighted least squares regression gives less weight to observations with high variance and more weight to observations with low variance. In effect, it tells the model to pay less attention to the noisy, spread-out points and more attention to the tightly clustered ones. Many statistical software packages handle this automatically once you specify the weights.

A third option, and often the easiest, is to use robust standard errors. This approach keeps your original regression model intact but recalculates the standard errors to account for the uneven spread. Your coefficients stay the same, but your p-values and confidence intervals become trustworthy again.