Correlation r, often called Pearson’s correlation coefficient, is a single number that measures how strongly two variables move together in a straight-line pattern. It ranges from −1 to +1, where +1 means a perfect positive relationship, −1 means a perfect negative (inverse) relationship, and 0 means no linear relationship at all.
If you’ve seen an “r value” in a study, a stats class, or a data report and wondered what it actually tells you, here’s the full picture: what the number means, how to read it, and where it can steer you wrong.
What the Number Actually Tells You
The correlation coefficient r captures two things at once: the direction and the strength of a linear relationship between two measured variables. “Linear” just means the pattern between them roughly follows a straight line when plotted on a graph.
The sign tells you direction. A positive r means both variables tend to increase together (think height and weight). A negative r means one variable tends to decrease as the other increases (think hours of TV watched and physical fitness scores). The strength of the correlation increases as r moves away from zero in either direction, so r = −0.8 represents just as strong a relationship as r = +0.8.
Conceptually, r is calculated by taking the covariance of two variables (a measure of how they vary together) and dividing it by the product of each variable’s standard deviation. This division is what forces r into the −1 to +1 range, creating a standardized scale you can compare across completely different measurements.
How to Interpret Different Values
The most widely cited benchmarks come from the psychologist Jacob Cohen, who proposed these thresholds in 1988:
- Small effect: r between 0.1 and 0.3
- Medium effect: r between 0.3 and 0.5
- Large effect: r greater than 0.5
These same thresholds apply to negative values. An r of −0.45 is a medium-strength inverse relationship.
That said, what counts as “strong” or “weak” depends heavily on the field. In psychology, researchers often label r = 0.7 as strong. In political science, that same value might be called “very strong.” In medicine, some classification systems label r = 0.7 as only moderate. A later review by Hemphill found that across psychological research, correlations above 0.3 already fall in the upper third of reported effect sizes, suggesting that in practice, even values that seem modest on paper can represent meaningful relationships.
The bottom line: don’t judge an r value purely by gut feeling. Context matters. An r of 0.3 linking a single dietary habit to long-term health outcomes could be highly meaningful, while an r of 0.7 between two nearly identical survey questions would be unremarkable.
R-Squared: The Practical Companion
If you square the correlation coefficient, you get r², called the coefficient of determination. This number has a more intuitive interpretation: it tells you the proportion of variation in one variable that is explained by its relationship with the other.
For example, if r = 0.7, then r² = 0.49. That means about 49% of the variation in one variable can be accounted for by its linear relationship with the other. The remaining 51% is driven by other factors. This makes r² especially useful when you want to communicate how much predictive power a relationship actually has. A correlation of r = 0.5 sounds respectable, but squaring it reveals that only 25% of the variation is shared between the two variables.
When R Can Mislead You
Pearson’s r is designed for one specific type of relationship: linear. If two variables have a strong curved relationship (imagine a U-shape), r can be close to zero even though the variables are clearly connected. For curved or ranked data, alternatives like Spearman’s correlation are better suited. Research in brain connectivity modeling, for instance, has shown that Spearman’s method outperforms Pearson’s when relationships between variables are nonlinear.
Outliers are another major issue. Because the calculation involves means and standard deviations, a single extreme data point can dramatically inflate or deflate r. A scatterplot with 50 tightly clustered points and one wild outlier can produce a misleadingly high or low correlation.
For Pearson’s r to be valid, your data needs to meet a few conditions: both variables must be numerical (not categories like “yes/no”), the distributions should be roughly bell-shaped, the relationship should be approximately linear, and there should be no extreme outliers pulling the result. When these conditions aren’t met, the number you get may not reflect reality.
Correlation Does Not Mean Causation
This is probably the most repeated warning in all of statistics, and for good reason. A high r value tells you that two variables move together. It does not tell you that one causes the other. The classic example: ice cream sales and sunscreen sales are positively correlated across the year, but buying ice cream doesn’t make people buy sunscreen. Both are driven by a third factor, hot weather.
This kind of hidden third variable (sometimes called a confounding variable) is extremely common. Smoking and alcohol use are correlated, but smoking doesn’t cause alcoholism. Two stocks in the same industry might rise and fall together without one driving the other. Correlation identifies patterns. Determining whether one thing actually causes another requires controlled experiments or more advanced statistical methods that can account for outside influences.
It’s also possible to find correlations between variables that have absolutely no meaningful connection, purely by chance. With enough variables to test, some pairs will appear correlated even when the relationship is completely spurious. This is why researchers pair r values with p-values (measures of statistical significance) to assess whether a correlation is likely real or just noise in the data.
Putting It Into Practice
When you encounter an r value, run through a quick mental checklist. First, check the sign: positive means the variables rise together, negative means one falls as the other rises. Second, check the magnitude: values closer to −1 or +1 indicate stronger relationships, values near zero indicate weak or no linear relationship. Third, square it to understand how much shared variation actually exists. An r of 0.4 means r² = 0.16, so only 16% of the variation is shared.
Finally, consider the context. Is the relationship likely linear? Could outliers be skewing the result? Could a third variable explain the pattern? A correlation coefficient is a starting point for understanding relationships in data, not the final word.

