Box and Whisker Plot: Distribution, Outliers, and Skewness

A box and whisker plot shows you five key values about a dataset at a glance: the minimum, the first quartile, the median, the third quartile, and the maximum. Together, these five numbers reveal where your data is centered, how spread out it is, whether it leans to one side, and whether any values are unusually far from the rest. It’s one of the most efficient ways to summarize a batch of numbers visually.

The Five-Number Summary

Every box plot is built from five data points, often called the five-number summary:

  • Minimum: the smallest value in the dataset
  • First quartile (Q1): the median of the lower half of the data, meaning 25% of values fall below it
  • Median (Q2): the middle value that splits the dataset in half
  • Third quartile (Q3): the median of the upper half, meaning 75% of values fall below it
  • Maximum: the largest value in the dataset

These five numbers divide your data into four equal chunks, each containing roughly 25% of the observations. That alone tells you a lot. If one chunk spans a wide range and another is compressed into a tiny range, you immediately know the data isn’t evenly distributed.

What Each Part of the Plot Represents

The rectangular box runs from Q1 to Q3. A horizontal line inside the box marks the median. Because Q1 and Q3 define the boundaries, the box captures the middle 50% of your data. The height (or width, if drawn horizontally) of this box is the interquartile range, or IQR, which is one of the most useful measures of how spread out your data is.

The whiskers are the lines extending from each end of the box. In the simplest version, one whisker stretches down to the minimum value and the other stretches up to the maximum. A more common approach limits the whiskers to 1.5 times the IQR beyond each edge of the box. Any data points that fall outside that range get plotted individually as dots. Those isolated dots are your outliers.

How It Reveals Outliers

The 1.5 IQR rule gives you a concrete, repeatable way to flag unusual values. Here’s how it works: calculate the IQR by subtracting Q1 from Q3, then multiply that number by 1.5. Subtract the result from Q1 to get your lower fence, and add it to Q3 for your upper fence. Any observation beyond either fence is considered an outlier.

For example, if Q1 is 80 and Q3 is 90, the IQR is 10. Multiply by 1.5 to get 15. Your lower fence is 65 and your upper fence is 105. A score of 60 or 110 would appear as an isolated point beyond the whisker, immediately visible as an extreme value. This is something a bar chart or a simple average can never show you.

Reading Skewness and Symmetry

The position of the median line inside the box, combined with the relative lengths of the whiskers, tells you whether your data is symmetric or skewed. In a perfectly symmetric distribution, the median sits near the center of the box and both whiskers are roughly equal in length.

When the data is skewed to the right (meaning a long tail of high values), the top whisker stretches much longer than the bottom one and the median line drifts toward the bottom of the box. The opposite pattern, a longer bottom whisker with the median rising toward the top of the box, signals a left skew. You can spot these patterns in a fraction of a second, which makes box plots especially useful for scanning large datasets quickly.

Whisker length relative to box length also carries information. Long whiskers compared to a short box suggest the data has heavy tails, with values spread far from the center. Short whiskers relative to a tall box mean the data clusters tightly with little extreme variation.

Comparing Multiple Groups

Box plots really shine when you place several of them side by side. If you want to compare test scores across five classrooms, or blood pressure readings across different age groups, lining up box plots on the same scale lets you compare medians, spreads, and skewness in one view. Histograms can do this too, but they take up far more space and become hard to read when you’re comparing more than two or three groups at once.

A study on infant birth weights, for instance, used comparative box plots to show that infants who survived and those who did not typically had different birth weight distributions. The difference was visible immediately: the boxes occupied different positions along the scale, with little overlap. That kind of compact, quickly absorbed comparison is what makes box plots a standard tool in medical research, education data, and quality control.

Notched Box Plots

Some box plots include a narrowing, or notch, around the median line. This notch represents the 95% confidence interval for the median. When you compare two notched box plots and their notches don’t overlap, you can be fairly confident the medians are genuinely different, not just different by chance. Overlapping notches don’t guarantee the medians are the same, but non-overlapping notches are a strong visual signal of a real difference. It’s a quick way to eyeball statistical significance without running a formal test.

What Box Plots Don’t Show

Box plots summarize distribution shape, but they hide some details. You can’t see how many data points are in the dataset, and you can’t tell whether the data within each quartile is clumped at one end or evenly spread. Two very different datasets can produce identical box plots if they happen to share the same five-number summary. A dataset with 20 points and one with 20,000 points could look the same.

For this reason, some analysts overlay individual data points on top of the box plot, or pair it with a histogram or density curve when sample size and fine-grained distribution matter. The box plot gives you the structural overview. Other charts fill in the texture.