Central tendency is a single number that represents the “center” or most typical value of an entire dataset. It’s one of the first concepts in statistics because it solves a basic problem: when you have dozens, thousands, or millions of data points, you need a way to summarize them into something meaningful. The three most common measures of central tendency are the mean, the median, and the mode, and each one captures the center of your data in a different way.
The Mean: Adding Everything Up
The mean is what most people think of as the “average.” You calculate it by adding all the values in a dataset and dividing by how many values there are. If five students scored 70, 80, 85, 90, and 95 on an exam, the mean is (70 + 80 + 85 + 90 + 95) ÷ 5 = 84.
The mean uses every single data point in its calculation, which makes it the most information-rich measure of central tendency. It also has a unique mathematical property: if you subtract the mean from every value in your dataset and add up those differences, the result is always zero. In plain terms, the values above the mean and below it perfectly balance out. This property makes the mean essential for many other statistical calculations.
The downside is that the mean treats every value equally, including extreme ones. A single very high or very low number can drag the mean away from where most of the data actually sits. This sensitivity to outliers is the main reason you’ll sometimes see a different measure used instead.
The Median: The Middle Value
The median is the value that sits exactly in the middle when you line up all your data from smallest to largest. Half the values fall below it, half above. Finding it is straightforward: sort your data, then pick the center point. If you have an odd number of values, it’s the one right in the middle. If you have an even number, you take the two middle values and average them.
Consider the sorted set (4, 5, 6, 8, 8, 9, 23). There are seven values, so the median is the 4th one: 8. Now take the sorted set (1, 2, 2, 3, 4, 7, 9, 10). There are eight values, so the median is the average of the 4th and 5th: (3 + 4) ÷ 2 = 3.5.
The median’s big advantage is its resistance to outliers. In the dataset (1, 1, 1, 2, 4, 5, 5, 5, 6, 11, 41), the value 41 pulls the mean up to 7.45, well above where most values cluster. The median, however, is 5. Remove the outlier entirely and the mean drops by 45%, while the median changes by only 10%. When data is skewed, the median often gives a more honest picture of where the typical value lies.
The Mode: The Most Frequent Value
The mode is simply the value that appears most often. In the set (3, 7, 7, 7, 12, 15), the mode is 7. It’s the only measure of central tendency that works with non-numeric data. If you surveyed 200 people about their favorite color and 65 said blue, 50 said red, and so on, the mode is “blue.” You can’t calculate a mean or median for colors, but you can identify which one came up most frequently.
A dataset can have more than one mode. If two values tie for the highest frequency, the data is called bimodal. Three tied values make it trimodal, and anything beyond that is multimodal. It’s also possible to have no mode at all. If every value in a dataset appears exactly once, reporting a mode is pointless because you’d just be restating the entire dataset.
How Skewed Data Shifts the Three Measures
In a perfectly symmetrical distribution, like a bell curve, the mean, median, and mode are all equal. They sit right at the center. Real-world data is rarely that tidy.
When data is skewed to the right (a long tail stretching toward high values), the mean gets pulled in that direction, away from where most values cluster. The median shifts less, and the mode stays at the peak. So in a right-skewed distribution, the order from lowest to highest is typically mode, median, mean. When data is skewed to the left, the pattern reverses: the mean gets pulled toward the low end, so it falls below the median, which falls below the mode.
This relationship is useful as a quick diagnostic. If the mean of your data is noticeably higher than the median, you likely have a right-skewed distribution with some unusually large values pulling the average up.
Why Household Income Uses the Median
The classic real-world example of choosing the right measure comes from income data. The Social Security Administration reports both average and median wages, and the median is consistently and substantially lower than the average. The reason is that income distribution is heavily right-skewed: a relatively small number of very high earners pulls the mean upward, making it look like the “typical” worker earns more than they actually do.
If you hear that the average household income in a city is $95,000, you might picture most families earning around that amount. But if a handful of millionaires live there, they inflate that average considerably. The median, perhaps $65,000, tells you what the household right in the middle actually earns. It’s a far more useful number for understanding what life looks like for a typical resident.
Which Measure to Use
The right choice depends on two things: the type of data you have and the shape of its distribution.
- Categorical data (like favorite color, political party, or college major) can only use the mode. These categories have no numeric value, so calculating a mean or median is impossible.
- Ranked or ordinal data (like satisfaction ratings of “low, medium, high” or class rankings) works best with the median. The values have a natural order, so you can find the middle, but the gaps between ranks aren’t necessarily equal, which makes averaging them misleading.
- Numeric data with a symmetrical distribution is best summarized by the mean. It uses the most information and powers further statistical analysis.
- Numeric data that is skewed or contains outliers is better represented by the median, since it won’t be distorted by extreme values.
In practice, reporting more than one measure often gives the clearest picture. When the mean and median of a dataset are close together, you know the data is fairly symmetrical and the average is trustworthy. When they diverge, that gap itself tells a story about the shape of the data and where the outliers lie.

