A normal probability plot is a graph that plots your data points against the values you’d expect if the data were perfectly normally distributed. If the points fall close to a straight diagonal line, your data is approximately normal. If they curve or bend away from that line, the data deviates from a normal distribution in specific, readable ways. Making one is straightforward in any statistical software, and reading one correctly is even more useful than creating it.
Why Use a Probability Plot Instead of a Histogram
Histograms seem like the obvious way to check whether data is bell-shaped, but they have a serious weakness: they change shape depending on how you set the bins. The same dataset can look roughly normal or clearly skewed just by adjusting the breakpoints along the axis. Penn State’s statistics program demonstrates this directly, showing that identical data produces very different visual impressions when histogram bin widths change.
A normal probability plot avoids this problem entirely. There are no bins to adjust. The data either falls along the reference line or it doesn’t, making the judgment far less subjective. It’s also more sensitive to deviations in the tails of the distribution, which is where departures from normality matter most for many statistical tests.
How the Plot Works
The concept behind a normal probability plot is simpler than it looks. Your data values are sorted from smallest to largest and plotted on one axis (usually the vertical axis). On the other axis, the plot places the “theoretical quantiles,” which are the values a perfectly normal distribution would produce for a dataset of that size. Each data point gets paired with its expected normal counterpart.
If your data is truly normal, every point lines up with its theoretical match, and you see a straight line. The slope of that line reflects the standard deviation of your data, and where it crosses the center reflects the mean. Deviations from the line tell you exactly how your data differs from normal, and where.
Reading Common Patterns
A curve that bows upward (concave) on both ends means your data has heavier tails than a normal distribution. This is common with financial data or measurement data that includes extreme values. A curve that flattens at both ends, forming an S-shape in the opposite direction, means lighter tails than normal.
If the points curve away from the line on only one side, your data is skewed. Points that bend upward at the high end but stay on the line at the low end indicate right skew. The reverse pattern signals left skew. One or two points that sit far from the line while the rest follow it closely suggest outliers rather than a fundamental departure from normality.
NIST’s engineering statistics handbook recommends generating a normal probability plot before running any formal outlier test. The reason: if your data isn’t normal to begin with, an outlier test might flag points that are perfectly expected for a non-normal distribution. The tails of the probability plot help you distinguish between a dataset that has genuine outliers and one that simply follows a different distribution shape.
Q-Q Plots vs. P-P Plots
You’ll encounter two related types of probability plots. A Q-Q (quantile-quantile) plot compares quantiles, which are the actual data values at each rank. A P-P (probability-probability) plot compares cumulative probabilities instead. Both test normality, but they’re sensitive to different parts of the distribution.
P-P plots are better at detecting differences in the middle of the distribution, where probability density is highest and cumulative probabilities change rapidly. Q-Q plots are better at detecting differences in the tails. For most practical purposes, the Q-Q plot is preferred because tail behavior is where normality assumptions break down most consequentially, and because Q-Q plots work naturally across distributions that differ in location and scale. When people say “normal probability plot,” they almost always mean a Q-Q plot against the normal distribution.
Making the Plot in R
R has built-in functions for this, no extra packages needed. The function qqnorm() takes a vector of your data and plots it against theoretical normal quantiles. The function qqline() adds the reference line.
qqnorm(your_data)creates the plot with your sorted data on the vertical axis and normal quantiles on the horizontal axis.qqline(your_data)draws a straight line through the first and third quartiles, giving you the visual benchmark to judge deviations.
For comparing your data against a distribution other than the normal, use qqplot(), which takes two arguments: a set of theoretical values and your data. But for checking normality specifically, qqnorm() is all you need.
A minimal working example:
data <- rnorm(100, mean=50, sd=10)
qqnorm(data)
qqline(data)
This generates 100 random normal values and plots them. Since the data is drawn from a normal distribution, the points should hug the reference line closely. Replace data with your own variable to check real data.
Making the Plot in Python
In Python, the scipy.stats.probplot function handles normal probability plots. You’ll also need matplotlib for the visual output.
import scipy.stats as stats
import matplotlib.pyplot as plt
res = stats.probplot(your_data, plot=plt)
plt.show()
By default, probplot compares your data against a normal distribution and fits a least-squares regression line through the points. Passing plot=plt tells the function to actually render the chart. Without that argument, the function only returns the calculated values without displaying anything.
The function returns a tuple containing the ordered data values, the theoretical quantiles, and (if fit=True, which is the default) the slope, intercept, and correlation coefficient of the best-fit line. That correlation coefficient, often called r, gives you a numeric summary: values very close to 1 mean the data closely follows a normal distribution.
To test against a different distribution, pass it through the dist parameter. For example, stats.probplot(your_data, dist=stats.t, sparams=(5,), plot=plt) would compare your data against a t-distribution with 5 degrees of freedom.
Making the Plot in Excel
Excel doesn’t have a built-in probability plot function, but you can construct one manually. Sort your data from smallest to largest in one column. In the next column, calculate the cumulative probability for each rank using the formula (i - 0.5) / n, where i is the rank (1, 2, 3, …) and n is the total number of data points. In a third column, convert those probabilities to normal quantiles using Excel’s NORM.S.INV() function on each probability value.
Then create a scatter plot with the normal quantiles on the horizontal axis and your sorted data on the vertical axis. Add a trendline to serve as the reference line. If the points follow the trendline closely, your data is approximately normal. This manual approach takes more setup than R or Python, but it works for smaller datasets when you don’t have access to statistical software.
How Many Data Points You Need
Normal probability plots become more informative with more data. With fewer than 20 or so observations, even truly normal data can produce plots with noticeable wobble around the reference line, making it hard to judge whether deviations are meaningful. Around 30 to 50 points, the plot starts to give reliable visual signals. With several hundred points, even subtle departures from normality become visible.
For very small samples, don’t over-interpret slight curves or gaps. Focus on whether the overall pattern is roughly linear rather than whether every point sits exactly on the line. Large, systematic departures (clear S-curves, sharp bends, or points that break away dramatically at the tails) are meaningful at any sample size.

