What Is the R² Value and What Does It Tell You?

The R² value (also called the coefficient of determination) tells you how much of the variation in one variable is explained by another. If you run a regression model predicting home prices based on square footage and get an R² of 0.75, that means 75% of the variation in prices is accounted for by differences in square footage. The remaining 25% comes from other factors your model doesn’t capture.

R² is always a number between 0 and 1 in standard linear regression. An R² of 1 means every data point falls perfectly on the regression line. An R² of 0 means the predictor variable explains none of the variation at all, and the regression line is perfectly flat.

How R² Is Calculated

R² works by comparing how much error your model produces against the simplest possible baseline: just guessing the average every time. The calculation breaks the total variability in your data into three pieces:

  • Total sum of squares (SST): How far each observed value falls from the overall average. This is the total variability you’re trying to explain.
  • Regression sum of squares (SSR): How far each predicted value falls from the overall average. This is the variability your model captures.
  • Error sum of squares (SSE): How far each observed value falls from the model’s prediction. This is the variability your model misses.

These three pieces follow a clean relationship: SST = SSR + SSE. The total variability equals the explained part plus the unexplained part.

R² is simply the explained portion divided by the total: R² = SSR / SST. You can also write it as R² = 1 − (SSE / SST), which reads as “one minus the fraction of variability the model fails to explain.” Both formulas give the same result.

R² and the Correlation Coefficient

If you’re working with simple linear regression (one predictor, one outcome), R² is literally the square of the Pearson correlation coefficient, r. A correlation of 0.9 between two variables gives an R² of 0.81. A correlation of −0.7 gives an R² of 0.49. Squaring removes the sign, so R² doesn’t tell you whether the relationship is positive or negative. It only tells you how strong it is.

This connection breaks down with multiple regression, where you have several predictors. In that case, R² still measures explained variance, but it’s no longer the square of a single correlation.

What Counts as a “Good” R² Value

There is no universal threshold. What counts as good depends entirely on the field and the type of data you’re working with.

In physics or engineering, where systems behave predictably and measurements are precise, R² values above 0.95 are common and expected. In social sciences or behavioral research, an R² of 0.30 to 0.50 can be perfectly respectable because human behavior is inherently variable. In medical research, R² values can be quite low while still representing meaningful, even life-saving findings. A drug treatment might explain only a small fraction of the variation in individual patient outcomes yet still show statistically significant benefits across thousands of patients. That small effect, applied at scale, can be worth millions of dollars and save many lives.

The key lesson: a low R² does not mean your model is useless, and a high R² does not guarantee your model is correct. R² measures explained variation, not whether you’ve identified the right cause or built a reliable prediction tool.

Why R² Can Be Misleading

R² has a built-in flaw in multiple regression. Every time you add a new predictor variable to your model, R² goes up, even if that variable is completely irrelevant. Throw in random noise as a predictor and R² will still tick upward. This happens because more predictors always allow the model to fit the existing data points a little more closely, regardless of whether that fit is meaningful.

This is where adjusted R² comes in. Adjusted R² applies a penalty based on the number of predictors relative to the number of data points. If a new variable genuinely improves the model, adjusted R² goes up. If the variable is just adding noise, adjusted R² goes down. When you’re comparing models with different numbers of predictors, adjusted R² is the more honest metric.

R² Doesn’t Work Well for Nonlinear Models

The clean mathematical logic behind R² depends on a property of linear regression: the total variability (SST) equals the regression variability (SSR) plus the error (SSE). In nonlinear models, this equation doesn’t hold. That means R² loses its core interpretation as “percentage of variance explained.”

Research published in BMC Pharmacology found that R² is rarely affected beyond the third or fourth decimal place even when comparing a correct nonlinear model to a clearly inferior one. In Monte Carlo simulations, R² couldn’t reliably distinguish between good and bad nonlinear fits. Information-based alternatives like AIC and BIC performed significantly better at identifying the correct model. If you’re fitting curves rather than straight lines, R² alone is not a trustworthy measure of how well your model works.

When R² Goes Negative

In standard linear regression with an intercept, R² stays between 0 and 1. But in certain situations it can go negative, which often surprises people seeing it for the first time.

A negative R² means your model fits the data worse than a flat horizontal line at the average. This typically happens when you force constraints on the model that don’t match the data. For example, if you constrain a regression line to cross the Y-axis at 150 but your data clusters around much lower values, the forced line can produce more error than simply predicting the mean for every point. It can also occur when you apply a model built on one dataset to a completely different dataset where the patterns don’t hold.

A negative R² is always a red flag. It means something about your model setup is fundamentally wrong for the data in front of you.

What R² Doesn’t Tell You

R² measures how well your model fits the data you already have. It does not tell you whether the relationship is causal, whether the model will predict well on new data, or whether you’ve included the right variables. A model with a high R² can still be overfit, biased, or missing an important predictor entirely.

It’s also possible to have a statistically significant relationship between variables with a very low R². Statistical significance tells you the relationship is unlikely to be zero. R² tells you how much of the variation that relationship accounts for. These are different questions, and both matter. In large datasets, even tiny effects become statistically significant, which means a significant result paired with a low R² is common and not contradictory.