How Do You Find r in Stats: Formula and Calculation

In statistics, “r” refers to the Pearson correlation coefficient, a number between -1 and +1 that measures how strongly two variables are linearly related. You find r by comparing how far each data point falls from its variable’s mean, then standardizing that comparison. The calculation can be done by hand, in a spreadsheet, or with a single function in most statistical software.

What r Actually Tells You

The correlation coefficient captures two things at once: the direction of a relationship and its strength. The sign tells you direction. A positive r means both variables increase together (more hours studied, higher test scores). A negative r means one variable increases while the other decreases (more hours of exercise, lower resting heart rate).

The number itself tells you strength. Values closer to 1 or -1 indicate a tighter relationship, while values near 0 indicate little or no linear pattern. Here’s a general guide:

0.7 to 1.0 (or -0.7 to -1.0): Strong correlation
0.5 to 0.7 (or -0.5 to -0.7): Moderate correlation
0.3 to 0.5 (or -0.3 to -0.5): Weak correlation
0 to 0.3 (or 0 to -0.3): Negligible or no correlation

An r of exactly 1 means every data point falls perfectly on a straight line sloping upward. An r of -1 means a perfect straight line sloping downward. In real data, you’ll almost never see a perfect 1 or -1.

The Formula Behind r

The Pearson correlation coefficient is calculated with this logic: for each data point, measure how far x is from the mean of x and how far y is from the mean of y, then multiply those two distances together. Add up all those products. Finally, divide by a scaling factor that keeps the result between -1 and +1.

Written out, the formula is: r equals the sum of (each x minus the x mean) times (each y minus the y mean), divided by the square root of the sum of (each x minus the x mean) squared, times the sum of (each y minus the y mean) squared.

The numerator is essentially the covariance, which captures whether x and y tend to move in the same direction. The denominator adjusts for the spread of each variable so that the final number is always on that -1 to +1 scale regardless of the units you’re working with.

How to Calculate r by Hand

Suppose you have five paired data points for variables x and y. Here’s the step-by-step process:

Step 1: Find the mean of all your x values and the mean of all your y values.

Step 2: For each data point, subtract the x mean from the x value. Do the same for y. These are your “deviations from the mean.”

Step 3: Multiply each x deviation by its paired y deviation. Add all these products together. This gives you the numerator of the formula.

Step 4: Square each x deviation and add them up. Square each y deviation and add them up. Multiply those two sums together, then take the square root. This gives you the denominator.

Step 5: Divide the result from Step 3 by the result from Step 4. That’s your r value.

For a quick example: if you have three data points where x = (1, 2, 3) and y = (2, 4, 6), the means are 2 and 4. The x deviations are -1, 0, 1. The y deviations are -2, 0, 2. The products are 2, 0, 2, summing to 4. The squared x deviations sum to 2, the squared y deviations sum to 8. The square root of 2 × 8 is 4. So r = 4/4 = 1.0, a perfect positive correlation, which makes sense because those points form a perfectly straight line.

Finding r in Excel, Python, and R

You rarely need to calculate r by hand outside of a classroom. Most tools have a built-in function that does it instantly.

In Excel or Google Sheets, use the CORREL function. If your x values are in cells A1 through A10 and your y values are in B1 through B10, type =CORREL(A1:A10, B1:B10). It returns the Pearson r directly.

In Python with the Pandas library, call .corr() on a DataFrame or between two Series. For example, df['x'].corr(df['y']) returns the Pearson r by default. You can also use NumPy’s np.corrcoef(x, y), which returns a correlation matrix where r sits in position [0, 1].

In the R programming language, the function is simply cor(x, y). This returns the Pearson coefficient by default. You can specify other methods like Spearman by adding the method argument, but for the standard r, the default works.

Requirements for r to Be Valid

Pearson’s r only works properly under certain conditions. If your data doesn’t meet these, the number you get can be misleading.

Both variables need to be continuous, meaning measured on a numerical scale where differences are meaningful (things like temperature, weight, test scores, or income). If one of your variables is categorical (like gender or color), Pearson’s r is the wrong tool.

The relationship between your variables needs to be roughly linear. If your scatterplot shows a curve, r can understate the true strength of the association. Two variables might have a strong U-shaped relationship yet produce an r near zero because the positive and negative portions cancel out. Always plot your data first.

Outliers can dramatically distort r. A single extreme point can pull the correlation toward 1 or push it toward 0, depending on where it falls. If your scatterplot shows one or two points far from the cluster, check whether removing them changes r substantially.

Each observation should also be independent, meaning one data point doesn’t influence another. If you’re measuring the same person repeatedly, or if your data points are clustered by group, standard Pearson correlation can give unreliable results. The data should also come from a random sample and be approximately normally distributed for the associated significance test (the p-value) to be trustworthy.

From r to r-Squared

Once you have r, you can get another widely used statistic by simply squaring it. This gives you R² (r-squared), also called the coefficient of determination. While r tells you the direction and strength of a relationship, R² tells you the percentage of variation in one variable that’s explained by the other.

If r = 0.80, then R² = 0.64, meaning 64% of the variation in y can be accounted for by the variation in x. The remaining 36% is due to other factors or random noise. This is especially useful in regression analysis, where you want to know how well your predictor actually explains the outcome. An r of 0.50 sounds decent, but squaring it reveals that only 25% of the variation is explained, which puts the strength of the relationship in more practical perspective.