What Does High Correlation Mean in Statistics?

A high correlation means two variables move together in a strong, predictable pattern. In statistical terms, a correlation coefficient above 0.7 (or below -0.7) is generally considered high, meaning that when one variable changes, the other tends to change in a consistent way. The scale runs from -1 to +1, where values closer to either end indicate a tighter relationship between the two variables.

How Correlation Is Measured

Correlation is expressed as a single number between -1 and +1, called the correlation coefficient (often written as “r”). A value of +1 means two variables move perfectly in sync: as one goes up, the other goes up by a proportional amount every time. A value of -1 means they move in perfectly opposite directions. A value of 0 means there’s no linear relationship at all.

Most researchers use rough categories to describe the strength of a correlation: below 0.1 is negligible, 0.1 to 0.3 is weak, 0.4 to 0.6 is moderate, 0.7 to 0.9 is high, and above 0.9 is very strong. These cutoffs are somewhat arbitrary, though. A correlation of 0.39 being labeled “weak” while 0.40 is called “moderate” doesn’t reflect a meaningful difference. The boundaries are guidelines, not hard rules.

Positive vs. Negative High Correlations

A high positive correlation means both variables increase together. Oil prices and airplane ticket prices, for example, have a correlation around +0.95. When oil gets more expensive, flights tend to cost more too.

A high negative correlation means one variable rises while the other falls. Outdoor temperature and heating bills show a correlation of roughly -0.96: as temperatures drop, heating costs climb. The negative sign doesn’t mean the relationship is weaker. A correlation of -0.90 is just as strong as +0.90. The sign only tells you the direction.

What a High Correlation Looks Like Visually

If you plot two highly correlated variables on a scatter plot, the data points cluster tightly around a line. At r = 0.9, the points form a narrow, elongated oval that clearly slopes upward (for positive) or downward (for negative). At r = 0.3, by contrast, the points spread into a wide, blobby cloud where the trend is hard to see. The tighter the cluster around that imaginary line, the higher the correlation.

High Correlation Does Not Mean One Causes the Other

This is the single most important thing to understand about correlation. Two variables can move together for reasons that have nothing to do with one causing the other. The number of master’s degrees awarded each year correlates highly with box office movie revenue. Video game sales correlate with nuclear energy production. High school graduation rates correlate with donut consumption. None of these pairs have a causal link. They all increase over time because the global population is growing, which pushes both numbers upward simultaneously.

Scientists use specific criteria to evaluate whether a high correlation might reflect a real cause-and-effect relationship. They look for a dose-response pattern (more of A consistently produces more of B), whether the same correlation appears across different populations and settings, and whether the relationship holds up after accounting for other variables that could be driving both. A high correlation is a starting point for investigation, not proof that one thing drives the other.

How Much a High Correlation Actually Predicts

Here’s where many people overestimate what a high correlation tells them. The predictive power of a correlation is captured by squaring it. This value, called R-squared, tells you what percentage of the variation in one variable is explained by the other. A correlation of 0.7 sounds impressive, but 0.7 squared is 0.49, meaning only 49% of the variation is accounted for. The other 51% comes from factors you haven’t measured.

Even a correlation of 0.8 only explains 64% of the variation. You need a correlation above 0.9 before you’re explaining more than 80% of what’s going on. This is why relying on a single high correlation to make predictions can be misleading. There’s often more unexplained variation than people assume.

Things That Can Fake a High Correlation

Several common issues can inflate a correlation coefficient and make a relationship look stronger than it really is.

  • Outliers: A single extreme data point can dramatically change the result. In one documented example, adding just one outlier to a dataset pushed the correlation from 0 to 0.71, creating the appearance of a relationship where none existed.
  • Small sample sizes: With only three to six observations, patterns can appear purely by chance. A high correlation from a tiny dataset is unreliable until confirmed with more data.
  • A hidden third variable: Two things that both increase with population growth, economic development, or the passage of time will show a high correlation even if they’re completely unrelated to each other.

Statistical Significance Is Not the Same as Strength

One common source of confusion is the difference between how strong a correlation is and whether it’s statistically significant. The p-value attached to a correlation tells you the probability that the result occurred by chance. It does not tell you how strong the relationship is. With a large enough sample, even a weak correlation of 0.31 can have a very low p-value (less than 0.0001), making it highly “significant” in statistical terms while still being practically unimpressive.

In one clinical dataset, blood pressure readings showed a moderate-to-strong correlation of 0.64 with a p-value below 0.0001. In the same dataset, the correlation between blood pressure and age was just 0.31, with the exact same p-value. Both were statistically significant, but only one represented a meaningfully strong relationship. Always look at the correlation coefficient itself, not just whether the result passed a significance test.

When Standard Correlation Doesn’t Apply

The most commonly used correlation measure (Pearson’s) assumes the relationship between two variables is linear, meaning they change at a consistent rate relative to each other. But some relationships are curved. Exercise and health benefits, for instance, don’t follow a straight line: the gains from going from sedentary to moderately active are much larger than the gains from moderate to extreme activity.

For these curved but still consistent relationships, a different measure called Spearman’s correlation is more appropriate. It captures whether two variables reliably move in the same direction, even if not at a constant rate. If you see a low standard correlation but a high Spearman correlation, it typically means a real relationship exists but isn’t a straight line. This distinction matters because a standard correlation can miss strong patterns that don’t happen to be linear.