What Does the Shapiro-Wilk Test Show?

The Shapiro-Wilk test shows whether a set of data follows a normal distribution, the familiar bell-shaped curve where most values cluster around the average. It gives you a p-value and a test statistic (called W) that together tell you if your data’s shape deviates significantly from what you’d expect if it were normally distributed. This matters because many common statistical methods only produce reliable results when the underlying data is at least approximately normal.

What the Test Actually Measures

The Shapiro-Wilk test compares your actual data to what a perfectly normal distribution with the same average and spread would look like. It does this by computing a W statistic, which essentially measures how well your ordered data points match the expected pattern of a normal curve. W ranges from 0 to 1, where a value of 1 means your data perfectly matches a normal distribution. Small values of W indicate a departure from normality.

More specifically, the test takes your data, sorts it from smallest to largest, and then checks how closely that ordered sequence lines up with the theoretical spacing you’d see in a normal sample of the same size. The formula, first proposed in 1965, uses pre-calculated weights based on what normal data “should” look like for your sample size. The numerator captures how well your data matches those expectations, and the denominator measures the total variability in your data. If the ratio is high, your data looks normal. If it’s low, something is off.

How to Read the Results

The test produces two key outputs: the W statistic and a p-value. The p-value is what most people focus on, and interpreting it is straightforward once you understand the test’s starting assumption.

The null hypothesis of the Shapiro-Wilk test is that your data comes from a normal distribution. This means the test assumes normality until proven otherwise. Here’s how to use the p-value:

  • p-value greater than 0.05: You don’t have enough evidence to reject normality. Your data is consistent with a normal distribution, and you can proceed with statistical methods that assume normality.
  • p-value less than 0.05: The distribution is significantly non-normal. Your data departs enough from the bell curve that you should consider using non-parametric methods instead.

A common mistake is reading a high p-value as “proof” that your data is normal. It isn’t. It just means the test didn’t detect a significant departure. With a very small sample, the test may lack the power to detect real deviations, so a non-significant result doesn’t guarantee normality.

Why Normality Matters for Other Tests

The reason people run the Shapiro-Wilk test in the first place is that many widely used statistical methods assume the data is normally distributed. T-tests, ANOVA, Pearson correlation, and linear regression all rely on this assumption to varying degrees. If your data isn’t normal and you run these tests anyway, your p-values and confidence intervals may be unreliable.

Checking normality with Shapiro-Wilk is typically one of the first steps in any analysis that involves these parametric methods. If the test flags your data as non-normal, you’d generally switch to alternatives: a Mann-Whitney U test instead of an independent t-test, a Kruskal-Wallis test instead of ANOVA, or Spearman correlation instead of Pearson. These non-parametric alternatives don’t assume a bell curve and work with data of any shape.

Where the Test Works Best

The Shapiro-Wilk test is generally considered one of the most powerful normality tests available, meaning it’s better at correctly detecting non-normal data than many alternatives. It performs particularly well with small to moderate sample sizes, roughly 3 to 50 observations, which is where many real-world datasets in research and business fall.

With very large samples (hundreds or thousands of data points), the test becomes extremely sensitive. It will flag even trivial, practically meaningless deviations from perfect normality as statistically significant. In real data, nothing is ever perfectly normal, so a large enough sample will almost always produce a significant result. This doesn’t necessarily mean the data is too non-normal to use parametric tests on. It just means the test is picking up on tiny imperfections that won’t affect your analysis in any meaningful way.

For this reason, statisticians recommend not relying on the Shapiro-Wilk test alone when your sample is large. Visual checks become more useful at that point.

Combining With Visual Methods

The Shapiro-Wilk test gives you a single yes-or-no answer, but it doesn’t tell you how your data deviates from normality or what shape it actually takes. That’s why it’s best used alongside visual tools that let you see the distribution for yourself.

A Q-Q plot (quantile-quantile plot) is the most common companion. It plots your data points against where they would fall if the data were perfectly normal. If the points follow a straight diagonal line, your data is approximately normal. Curves or S-shapes reveal skewness or heavy tails. A histogram gives you a quick visual impression of the overall shape, letting you spot obvious skew or multiple peaks. Together, these tools tell you not just whether your data is non-normal, but in what way, which helps you decide what to do about it.

The best practice is to use all three: the Shapiro-Wilk p-value for a formal statistical check, a Q-Q plot for detail on where the deviations occur, and a histogram for the big picture. If all three point the same direction, you can be confident in your conclusion about the data’s shape.