To draw a box plot, you need five statistics known as the five-number summary: the minimum, first quartile (Q1), median, third quartile (Q3), and maximum. These five values define every part of the plot, from the edges of the box to the tips of the whiskers. If you want to flag outliers, you’ll also need one derived value called the interquartile range, or IQR.
The Five-Number Summary
Each of the five statistics maps directly to a visual element on the plot:
- Minimum: the smallest value in your dataset. This sets the endpoint of the lower whisker (unless outliers are present).
- First quartile (Q1): the value below which 25% of the data falls. This forms the bottom edge of the box.
- Median (Q2): the middle value of the dataset, where 50% of observations sit on either side. This is the line drawn inside the box.
- Third quartile (Q3): the value below which 75% of the data falls. This forms the top edge of the box.
- Maximum: the largest value in the dataset. This sets the endpoint of the upper whisker (unless outliers are present).
The box itself spans from Q1 to Q3, so its height (or width, if horizontal) represents the interquartile range. That middle 50% of your data lives inside the box, with the median line splitting it.
How to Calculate Each Statistic
Before calculating anything, sort your data from smallest to largest. Every statistic in the five-number summary depends on the rank order of your values.
The minimum and maximum are straightforward: just pick the smallest and largest numbers. The median is the middle value when the dataset has an odd number of points, or the average of the two middle values when it’s even.
Q1 and Q3 require a bit more work. One reliable method uses a formula to find the exact index position in your sorted data:
True index location = (number of data points − 1) × percentile
For Q1, the percentile is 0.25. For Q3, it’s 0.75. If the index lands exactly on a whole number, that position in your sorted list is your quartile value. If it lands between two positions, you interpolate using the formula: low number + (high number − low number) × the decimal portion of the index. For example, if your true index is 2.75, you take the value at position 2, then add 75% of the difference between the values at positions 2 and 3.
As a quick example: for a dataset with 13 values, the Q1 index would be (13 − 1) × 0.25 = 3.0, so Q1 is simply the value at position 3 in the sorted list. The Q3 index would be 12 × 0.75 = 9.0, so Q3 is the value at position 9.
The Interquartile Range and Outlier Fences
The IQR is not one of the five core statistics, but you calculate it from two of them: IQR = Q3 − Q1. This value measures the spread of the middle half of your data, and it’s essential for determining where the whiskers actually end on most standard box plots.
The most common convention, introduced by John Tukey, uses the IQR to set “fences” for outliers. Any data point below Q1 − 1.5 × IQR is a low outlier, and any point above Q3 + 1.5 × IQR is a high outlier. In a Tukey-style box plot, the whiskers don’t extend to the raw minimum and maximum. Instead, each whisker reaches to the most extreme data point that still falls within those fences. Outliers beyond the fences are plotted as individual dots.
A simpler style (sometimes called Spear style) skips the outlier detection entirely and extends the whiskers all the way to the true minimum and maximum. If you’re drawing a box plot by hand for a class assignment, check which convention your instructor expects, because the whisker endpoints will differ.
Do You Need the Mean?
No. The mean is not required for a standard box plot. Box plots are built entirely around the median and quartiles, which makes them especially useful for skewed data where the mean can be misleading. Some software tools offer an option to overlay the mean as a separate marker (often a dot or diamond inside the box), but this is an optional addition, not part of the standard plot.
Reading Skewness From the Statistics
Once you have the five-number summary, you can already tell whether your data is skewed before you even draw the plot. In a perfectly symmetric dataset, the median sits exactly in the center of the box, meaning Q3 − median equals median − Q1.
If the median is closer to Q1 (shifted toward the bottom of the box), the data is right-skewed, meaning there’s a longer tail of higher values stretching upward. If the median is closer to Q3, the data is left-skewed, with a longer tail of lower values. The whisker lengths reinforce this: in right-skewed data, the upper whisker will be noticeably longer than the lower one.
Putting It All Together
Here’s the full checklist of what you need, in the order you’d calculate it:
- Sorted data: arrange all values from lowest to highest.
- Minimum: the first value in the sorted list.
- Q1: the 25th percentile.
- Median: the 50th percentile.
- Q3: the 75th percentile.
- Maximum: the last value in the sorted list.
- IQR: Q3 − Q1 (needed only if you’re marking outliers).
- Lower fence: Q1 − 1.5 × IQR (optional, for Tukey-style whiskers).
- Upper fence: Q3 + 1.5 × IQR (optional, for Tukey-style whiskers).
With these values in hand, the plot draws itself. The box runs from Q1 to Q3, a line crosses it at the median, and the whiskers extend outward to either the min/max or the nearest data points within the fences. Any values beyond the fences get their own individual dots. That’s the entire anatomy of a box plot, built from just five core numbers and one simple calculation.

