What Is the Durbin-Watson Test and How Does It Work?

The Durbin-Watson test is a statistical test that checks whether the errors in a regression model are correlated with each other over time. It produces a single number between 0 and 4, where a value of 2.0 means no correlation was detected. This test matters because correlated errors violate a core assumption of linear regression, which can make your model’s predictions and confidence intervals unreliable.

What the Test Actually Measures

When you run a regression, you get residuals: the differences between your predicted values and the actual data points. In a well-behaved model, those residuals should be random. But with time-series data or any data collected in sequence, residuals often follow a pattern. If today’s error is positive, tomorrow’s tends to be positive too. That pattern is called autocorrelation, or serial correlation.

The Durbin-Watson test specifically looks for first-order autocorrelation, meaning it checks whether each residual is correlated with the one immediately before it. It does not detect more complex patterns where, say, every third or fourth residual is correlated. The formal setup tests a null hypothesis that the errors are uncorrelated against an alternative hypothesis that each error depends on the previous one by some fixed amount.

How the D-Statistic Works

The test statistic, usually called “d,” is calculated by taking the squared differences between each consecutive pair of residuals and dividing by the total sum of squared residuals. In plain terms, it compares how much the residuals change from one observation to the next relative to their overall size.

There’s a useful shortcut for understanding the result. The d-statistic is approximately equal to 2 times (1 minus the autocorrelation coefficient). So if the autocorrelation is zero, d lands at 2.0. If residuals are perfectly positively correlated (autocorrelation of 1), d drops to 0. If they’re perfectly negatively correlated (autocorrelation of -1), d rises to 4.0.

This relationship makes interpretation straightforward:

d near 2.0: No autocorrelation. Your residuals look independent, which is what you want.
d between 0 and 2: Positive autocorrelation. Residuals tend to follow the same direction as the previous one.
d between 2 and 4: Negative autocorrelation. Residuals tend to flip sign from one observation to the next.

Reading the Results: Critical Values

Unlike many statistical tests that give you a clean “reject or don’t reject” answer, the Durbin-Watson test has three possible outcomes: reject, fail to reject, or inconclusive. This happens because the exact distribution of the test statistic depends on the specific data in your model, so statisticians established two boundary values, called dL (lower) and dU (upper), rather than a single cutoff.

For testing positive autocorrelation (the most common concern), the decision rules work like this. If your d-statistic falls below dL, you reject the null hypothesis and conclude that positive autocorrelation is present. If d falls above dU, you fail to reject and treat the residuals as independent. If d lands between dL and dU, the test is inconclusive.

The exact dL and dU values depend on your sample size and the number of independent variables in your model. These are found in published tables, typically at the 5% significance level. As a rough guideline, for a sample of about 50 observations, d values between roughly 1.4 and 2.6 are generally considered acceptable.

Why Autocorrelation Matters

If your regression residuals are autocorrelated, the model isn’t necessarily wrong in its estimates of the relationship between variables. The coefficients themselves may still be unbiased. The real problem is with the standard errors: they become artificially small, which makes your results look more statistically significant than they actually are. Confidence intervals become too narrow, and p-values become misleadingly low. You end up thinking your findings are more precise and reliable than they really are.

This issue comes up most often with time-series data, where observations have a natural order. Stock prices, monthly sales figures, temperature readings, and economic indicators collected over time are all prone to autocorrelated errors. Cross-sectional data (like a survey of different people at one point in time) is less susceptible, though spatial correlation can sometimes cause similar problems.

Assumptions and Limitations

The Durbin-Watson test requires a few conditions to work properly. Your regression model needs to include an intercept term. The errors should be normally distributed with a constant variance. And critically, the test only detects first-order autocorrelation. If your residuals have a seasonal pattern, with correlation at lag 4 (quarterly data) or lag 12 (monthly data), the Durbin-Watson test can miss it entirely.

Another limitation: the test doesn’t work well when your regression includes a lagged version of the dependent variable as a predictor. In that situation, the test tends to produce d-values close to 2.0 regardless of whether autocorrelation exists, giving you a false sense of security.

Alternatives for More Complex Patterns

The Breusch-Godfrey test addresses several of the Durbin-Watson test’s shortcomings. It can detect autocorrelation at higher orders, not just lag 1, meaning it catches seasonal and longer-cycle patterns. It also works in models that include lagged dependent variables. The test runs an auxiliary regression on the residuals and uses an R-squared-based statistic that follows a chi-squared distribution, making it more flexible in practice.

For most modern time-series analysis, the Breusch-Godfrey test is the more versatile choice. The Durbin-Watson test remains popular partly because of its simplicity and because many software packages report it automatically alongside regression output.

Running the Test in Practice

Most statistical software computes the Durbin-Watson statistic with minimal effort. In Python’s statsmodels library, the function durbin_watson() takes your model’s residuals as input and returns the d-statistic directly. In R, the dwtest() function from the lmtest package does the same and also provides a p-value. SPSS and Stata include it as a standard option in their regression output.

Once you have the d-value, compare it to the critical values for your sample size and number of predictors. If the result suggests autocorrelation is present, common next steps include adding lagged variables to your model, using a different estimation method that accounts for correlated errors, or switching to a time-series model designed to handle serial dependence directly.