What Is a High VIF? Thresholds, Causes, and Fixes

A high VIF (variance inflation factor) generally means a value of 5 or above, signaling that one of your predictor variables is strongly correlated with others in a regression model. A VIF of 10 or higher is widely considered severe. When VIF climbs into that range, the estimates your regression produces become unreliable, and variables that truly matter can appear statistically insignificant.

What VIF Actually Measures

VIF quantifies how much the variance of a regression coefficient gets “inflated” because of correlation between predictor variables. The formula is straightforward: for any given predictor, VIF equals 1 divided by (1 minus the R-squared you’d get if you regressed that predictor against all the other predictors in your model).

A VIF of 1 means a predictor has zero correlation with the others. Its coefficient estimate is as precise as it can be. A VIF of 5 means the variance of that coefficient is five times larger than it would be if the predictor were completely independent. A VIF of 10 means the variance is ten times larger. The higher the number, the less you can trust what the model tells you about that variable’s effect.

Common Thresholds

The two most cited cutoffs are 5 and 10. A VIF below 5 is typically considered acceptable. Between 5 and 10, multicollinearity is moderate and worth investigating. Above 10, most analysts treat it as a serious problem that needs to be addressed before interpreting results.

Some fields use stricter standards. In medical and public health research, a VIF above 2.5 sometimes raises flags, because the consequences of drawing wrong conclusions carry more weight. In exploratory social science work, researchers may tolerate VIFs up to 10 if the model’s overall predictive power is the goal rather than interpreting individual coefficients. The right threshold depends on what you’re trying to do with the model.

How a High VIF Distorts Your Results

When two or more predictors are highly correlated, the model struggles to separate their individual effects. The result is inflated variance in the coefficient estimates, which causes a chain reaction of problems.

First, the standard errors of those coefficients balloon. Standard error is just the square root of variance, so when variance is inflated five or ten times, standard errors grow substantially. Second, wider standard errors produce wider confidence intervals, meaning the model is far less certain about the true value of each coefficient. Third, the t-statistic used to test whether a coefficient differs from zero shrinks. A smaller t-statistic means a larger p-value, which can make a genuinely important predictor appear statistically insignificant. You could end up dropping a variable that actually matters, or concluding it has no effect when it does.

The coefficients themselves can also become unstable. Small changes in your data, like adding or removing a few observations, can cause large swings in the estimated values. Two correlated predictors might even flip signs, with one appearing to have a positive effect and the other negative, when both actually push in the same direction.

When a High VIF Is Not a Problem

Not every high VIF requires action. If your model includes polynomial terms (like a squared version of a variable) or interaction terms (where two variables are multiplied together), those terms will naturally be correlated with the original variables they’re built from. Their VIFs will be high by construction, and that’s expected.

The University of Wisconsin’s Social Science Computing Core recommends fitting a simpler model without the polynomial and interaction terms, then checking VIFs again. If the simpler model shows acceptable VIFs, the high values in the full model are just an artifact of how those terms are created, not a sign of a real multicollinearity problem.

Similarly, if your only goal is prediction and you don’t care about interpreting individual coefficients, high VIF is less concerning. The model’s overall predictions can still be accurate even when individual coefficient estimates are unstable.

How to Fix High VIF

The most direct fix is removing one of the correlated predictors. If two variables measure essentially the same thing (say, height in inches and height in centimeters), keeping both adds no information and inflates VIF. Choose the one that’s more relevant to your research question and drop the other.

Combining correlated variables into a single measure is another option. If you have several survey questions that all capture the same underlying concept, averaging them or using a technique like principal component analysis to create one composite score eliminates the redundancy while preserving the information.

Centering your variables, meaning subtracting the mean from each value, can help specifically with multicollinearity caused by polynomial or interaction terms. It doesn’t change the relationships in your data, but it reduces the artificial correlation between a variable and its squared or interaction terms.

Collecting more data sometimes helps, but only if the correlation between predictors is partly a quirk of your sample rather than a fundamental feature of how those variables relate in the real world. If two variables are inherently linked (like age and years of work experience), no amount of additional data will break that correlation.

Checking VIF in Practice

Most statistical software calculates VIF automatically. In R, the vif() function from the car package is standard. In Python, statsmodels offers variance_inflation_factor(). SPSS and Stata include it as an option in regression output.

You’ll get one VIF value for each predictor in your model. Check them all. A single variable with a VIF of 25 is a clear problem, but so is a pattern where several variables sit between 5 and 10. Some analysts also look at the tolerance value, which is simply 1 divided by VIF. A tolerance below 0.2 (corresponding to a VIF of 5) signals moderate concern, and below 0.1 (VIF of 10) signals a serious issue.

The key is to check VIF before interpreting your regression coefficients, not after. If you’ve already drawn conclusions from a model with high multicollinearity, those conclusions may not hold up once the correlated predictors are addressed.