How to Find Sum of Squared Deviations From the Mean

The sum of squared deviations (often called the “sum of squares”) measures how spread out a set of numbers is. To find it, you subtract the mean from each data point, square each result, and add those squared values together. The formula is Σ(X − μ)², where X is each individual value and μ is the mean. Let’s walk through exactly how this works.

Why Squaring Matters

You might wonder why you can’t just add up the raw deviations (the distances from the mean) and call it a day. The problem is that some deviations are positive (values above the mean) and some are negative (values below the mean), and they always cancel each other out to zero. A dataset with wildly scattered values would look identical to one where every value is the same. Squaring each deviation solves this by making every value positive before you add them together. It also amplifies the effect of extreme values, which is useful because outliers genuinely do contribute more to the overall spread of your data.

Step-by-Step Calculation

Here’s the process, broken into five clear steps. We’ll use a small dataset to make it concrete: 2, 4, 6, 8, 10.

Step 1: Find the Mean

Add all the values together and divide by how many there are. For our dataset: (2 + 4 + 6 + 8 + 10) = 30, divided by 5 values, gives a mean of 6.

Step 2: Subtract the Mean From Each Value

These differences are called deviations. Values below the mean produce negative deviations, and values above the mean produce positive ones.

  • 2 − 6 = −4
  • 4 − 6 = −2
  • 6 − 6 = 0
  • 8 − 6 = 2
  • 10 − 6 = 4

Notice that if you added these deviations directly (−4 + −2 + 0 + 2 + 4), you’d get exactly zero. That’s the problem squaring fixes.

Step 3: Square Each Deviation

  • (−4)² = 16
  • (−2)² = 4
  • (0)² = 0
  • (2)² = 4
  • (4)² = 16

Step 4: Add the Squared Deviations

16 + 4 + 0 + 4 + 16 = 40. That’s your sum of squared deviations. For this dataset, SS = 40.

Organizing Your Work With a Table

When you’re working with more than a handful of values, a table keeps things clean. Set up four columns: the original value (X), the mean (μ), the deviation (X − μ), and the squared deviation (X − μ)². Fill in each row, then sum the last column. This approach dramatically reduces mistakes, especially during exams or when you’re double-checking your work. For longer datasets, it also makes it easy to spot where you went wrong if your final number doesn’t look right.

How Sum of Squares Connects to Variance

The sum of squared deviations is not usually the end goal. It’s the numerator of both variance and standard deviation, which are far more commonly reported. To turn a sum of squares into a variance, you divide it by the number of data points. For our example: 40 ÷ 5 = 8. That’s the population variance. Take the square root of variance and you get the standard deviation.

This is where a critical distinction comes in. If your data represents an entire population (every value you care about), you divide by N. If your data is a sample drawn from a larger population, you divide by N − 1 instead. This adjustment, called degrees of freedom, corrects for the fact that a sample tends to underestimate the true spread. For our five-number example treated as a sample, the variance would be 40 ÷ 4 = 10 instead of 8.

The sum of squared deviations itself, however, is calculated the same way regardless of whether you’re working with a population or a sample. The only difference shows up in the next step, when you divide.

The Shortcut Formula

There’s an alternative version of the formula that skips the step of calculating individual deviations. Instead of subtracting the mean from every value, you square each original value, sum those squares, then subtract a correction factor. The formula is: SS = ΣX² − (ΣX)²/N. For our dataset:

  • ΣX²: 4 + 16 + 36 + 64 + 100 = 220
  • (ΣX)²/N: 30² / 5 = 900 / 5 = 180
  • SS: 220 − 180 = 40

Same answer, fewer steps. This version is especially useful when the mean is a messy decimal, because you never have to subtract it from each data point. It’s the formula most statistics textbooks call the “computational formula” as opposed to the “definitional formula” we used earlier.

Using Excel or Google Sheets

If you’d rather let a spreadsheet handle it, Excel and Google Sheets both have a built-in function called DEVSQ. It returns the sum of squared deviations directly. The syntax is straightforward: =DEVSQ(A1:A5), where A1:A5 is the range containing your data. You can also list values manually, like =DEVSQ(2,4,6,8,10). The function accepts up to 255 arguments or a single array reference.

If you want to build it from scratch in a spreadsheet (which can be helpful for seeing the intermediate steps), put your data in column A, calculate the mean with =AVERAGE(A1:A5), then in a new column use a formula like =(A1-$B$1)^2 to get each squared deviation. Sum that column, and you have your answer.

Where Sum of Squares Shows Up

Beyond variance and standard deviation, the sum of squared deviations is a building block for many statistical methods. In regression analysis, the total sum of squares (SSTO) quantifies how much your outcome variable varies overall. It gets partitioned into the sum of squares explained by your model and the sum of squares left unexplained (the error). Comparing these pieces is the foundation of analysis of variance (ANOVA), which tests whether group means are meaningfully different from each other.

In simpler terms: any time a statistical test asks “how much variation is there, and where does it come from?”, the sum of squared deviations is doing the heavy lifting behind the scenes. Understanding how to calculate it by hand gives you a much clearer picture of what those tests are actually measuring.