What Is a Strong R-Squared? Benchmarks by Field

A strong R-squared value depends entirely on the field you’re working in. In physical sciences, anything below 0.90 might be disappointing. In social sciences, 0.30 can be considered strong. There is no universal cutoff, and treating one as gospel is a common mistake that leads to either dismissing useful models or trusting misleading ones.

R-squared (written as R²) tells you the proportion of variation in your outcome that your model explains. An R² of 0.70 means your variables account for 70% of the variation in whatever you’re measuring. The remaining 30% is unexplained, whether from missing variables, randomness, or measurement error. It ranges from 0 to 1, with 1 meaning a perfect fit.

Benchmarks by Field

The biggest factor in what counts as “strong” is the type of data you’re analyzing. Fields where outcomes are driven by well-understood physical laws naturally produce higher R² values. Fields where human behavior is involved produce lower ones, because people are inherently unpredictable.

In physical sciences and engineering, models routinely hit R² values above 0.90. If you’re modeling, say, how pressure changes with temperature in a closed system, you’d expect something very close to 1.0. An R² of 0.75 in a controlled lab experiment would raise questions about your methodology.

In social sciences and psychology, values as low as 0.10 to 0.30 are often considered acceptable. Human behavior is shaped by hundreds of overlapping factors, most of which can’t be measured in a single study. A model explaining 25% of the variation in a psychological outcome is genuinely informative.

In finance and economics, expectations vary widely depending on what you’re predicting. Stock-level return models often produce R² values between 5% and 30%. One large-scale analysis of U.S. stocks from 1926 to 2010 found median R² values that fluctuated from below 10% to over 50% across different time periods. For stock returns specifically, even 10% explained variance can carry meaningful signal in a noisy environment.

In clinical medicine, the same range applies as social sciences, roughly 0.10 to 0.30, because patient outcomes are influenced by genetics, lifestyle, adherence, and countless unmeasured variables. A predictive model for disease outcomes that achieves an R² of 0.50 or higher would be considered quite strong. For context, one study using machine learning to predict obesity levels from health records achieved R² values ranging from 0.25 (linear regression) up to 0.87 (random forests), with 0.87 being an exceptionally high result for clinical data.

Cohen’s Effect Size Guidelines

Jacob Cohen, the statistician whose benchmarks are widely cited across research, proposed a related measure called f², which translates directly to R². Under his framework, a small effect corresponds to f² of 0.02 (roughly R² = 0.02), a medium effect to f² of 0.15 (roughly R² = 0.13), and a large effect to f² of 0.35 (roughly R² = 0.26). By these standards, explaining just 26% of the variance qualifies as a large effect in behavioral research. Cohen himself cautioned that these were rough guidelines, not rigid thresholds.

Why a High R-Squared Can Be Misleading

A very high R² isn’t always good news. One of the most common traps is overfitting, where a model learns the quirks and noise in your specific dataset so thoroughly that it fails when applied to new data. An overfitted model can show near-perfect R² on training data and then perform terribly on anything it hasn’t seen before. The predicted line covers every data point, including the random noise that has nothing to do with real patterns.

This is especially common when you have many predictor variables relative to a small sample size. Every variable you add will increase R², even if that variable has no real relationship to the outcome. A model with 50 predictors and 60 data points can produce an impressive-looking R² that is essentially meaningless.

Spurious correlation is another risk. Two variables can move together purely by coincidence, particularly in time-series data. If your R² looks suspiciously high and the relationship doesn’t make theoretical sense, that’s a red flag worth investigating.

When to Use Adjusted R-Squared

Standard R² will never decrease when you add more variables to a model, even useless ones. Adjusted R² solves this by penalizing the score for each additional predictor that doesn’t meaningfully improve the fit. If you add a variable and adjusted R² drops, that variable isn’t earning its place in the model.

This makes adjusted R² the better metric when you’re comparing models with different numbers of predictors. You can test variables by adding them one at a time and watching whether adjusted R² improves or deteriorates. If it starts declining, you’ve likely added variables that are fitting noise rather than signal. For a single model with a fixed set of predictors, regular R² and adjusted R² tell a similar story.

A Low R-Squared Can Still Be Valuable

One of the most persistent misconceptions is that a low R² means your model is useless. That’s not true. If your predictor variables are statistically significant, they’re telling you something real about the outcome, even if the overall R² is small. A model with R² of 0.08 that identifies a genuine risk factor for heart disease is clinically useful, even though it explains less than 10% of the variation.

As researchers at Duke University have noted, in some contexts an R² of 10% or even less can carry real information value, particularly when you’re searching for a weak signal buried in noisy data where even a faint one would matter. Think of predicting earthquake aftershocks or identifying early biomarkers for disease. You wouldn’t expect to explain most of the variation, but finding any reliable signal is progress.

The practical question isn’t “is my R² above some magic number?” It’s “does my model explain enough variation to be useful for what I need it to do?” A weather model with R² of 0.40 that helps farmers decide when to plant is more valuable than a lab experiment with R² of 0.95 that confirms something already obvious. Context always wins over arbitrary cutoffs.