How to Test for Normality: Visual and Statistical Methods

Testing for normality means checking whether your data follows a bell-shaped (normal) distribution, which is a requirement for many common statistical tests like t-tests and ANOVA. There’s no single perfect method. The best approach combines visual inspection with a formal statistical test, and the right choice depends heavily on your sample size.

Start With Visual Methods

Before running any formal test, plot your data. Two plots give you the most information: a histogram and a Q-Q (quantile-quantile) plot.

A histogram shows you the overall shape of your data at a glance. You’re looking for the classic bell curve: roughly symmetric, with most values clustered around the center and tapering off evenly on both sides. Obvious skew (a long tail stretching one direction) or multiple peaks are immediate red flags. Histograms work well for getting a quick sense of your distribution, but they can be misleading with small samples because the shape changes depending on how you set the bin width.

A Q-Q plot is more diagnostic. It plots your data’s values against where those values would fall if the data were perfectly normal. If your data are normally distributed, the points form a roughly straight line. Deviations from that line tell you specific things about how your data differ from normal. Points that curve away from the line at both ends in an S-shape indicate skewness. Points that flare away from the line at the extremes suggest your data have more outliers or extreme values than a normal distribution would predict. The Q-Q plot is particularly useful because it highlights problems in the tails of the distribution, which is exactly where violations of normality cause the most trouble for statistical tests.

Check Skewness and Kurtosis Values

Skewness measures how lopsided your distribution is. Kurtosis measures how heavy the tails are compared to a normal distribution. A perfectly normal distribution has a skewness of zero and an excess kurtosis of zero. Your data won’t hit those exactly, so the question is how far off is too far.

The thresholds depend on your sample size. For small samples (under 50), convert your skewness and kurtosis into z-scores and check whether either exceeds 1.96 in absolute value. If so, your data likely aren’t normal. For medium samples (50 to 300), use a stricter cutoff of 3.29. For large samples over 300, skip the z-scores entirely and look at the raw values instead: an absolute skewness greater than 2 or an absolute kurtosis greater than 7 signals a substantial departure from normality.

The Shapiro-Wilk Test

The Shapiro-Wilk test is widely considered the best general-purpose normality test. It works by measuring how well your data’s values correlate with the values you’d expect from a normal distribution. The null hypothesis is straightforward: your data come from a normal distribution. A small p-value (typically below 0.05) means you reject that assumption.

Originally, the test was limited to samples smaller than 50, but later modifications extended its range to samples up to 5,000. It consistently outperforms other normality tests in comparative studies, which is why many statisticians recommend it as the default choice. In R, you can run it with the base function shapiro.test() or the pipe-friendly shapiro_test() from the rstatix package. In Python, it’s available through scipy.stats.shapiro().

One important caveat: all formal normality tests, including Shapiro-Wilk, have low statistical power when sample sizes are small (30 or below). At those sizes, the test often fails to detect real departures from normality. Power for all major normality tests drops below 40% at samples of 30 or fewer. This means a non-significant result with a small sample doesn’t prove your data are normal; it just means the test couldn’t tell.

Other Statistical Tests

The Kolmogorov-Smirnov (K-S) test compares your data against a fully specified distribution. The catch is that it requires you to know the exact mean and standard deviation of the distribution you’re testing against, not estimates from your sample. If you estimate those parameters from your data (which is almost always the case), the standard K-S test gives incorrect p-values. The Lilliefors correction fixes this problem by adjusting the critical values for the case where mean and standard deviation are estimated from the data. If you’re using SPSS, the software automatically applies the Lilliefors correction when running a K-S normality test.

The Anderson-Darling test is a modified version of the K-S test that gives more weight to the tails of the distribution. This makes it more sensitive to outliers and extreme values. If your concern is specifically about whether the tails of your distribution behave normally (which matters for tests that are sensitive to outliers), Anderson-Darling is a strong choice. The tradeoff is that it requires separate critical values for each type of distribution, making it less flexible as a general tool.

The D’Agostino-Pearson test takes a different approach entirely. Instead of comparing your data against a theoretical distribution, it combines two separate tests: one for skewness and one for kurtosis. The combined test statistic follows a chi-square distribution, which makes it easy to compute a p-value. This test is particularly useful when you want to know not just whether your data are non-normal, but specifically whether the problem is asymmetry, heavy tails, or both.

The Large Sample Problem

Formal normality tests become increasingly sensitive as your sample size grows. With thousands of data points, even trivial, practically meaningless deviations from perfect normality will produce a significant p-value. Your data might be “close enough” to normal for any statistical test to work perfectly well, yet the Shapiro-Wilk test rejects normality because it detected a tiny wobble that has no real impact on your analysis.

This is why visual methods and skewness/kurtosis values become more important with large samples. A Q-Q plot that looks reasonably straight, combined with skewness below 2 and kurtosis below 7, is more informative than a p-value when you have hundreds or thousands of observations.

A Common Misconception About Sample Size

You may have heard that with a large enough sample (often cited as 30 or more), you don’t need to worry about normality at all because of the central limit theorem. This is misleading. The central limit theorem says the distribution of sample means approaches normality as sample size increases. It does not say your actual data become normally distributed. If your underlying population is strongly skewed, your individual data points will still be skewed regardless of how many you collect. The normality assumption in tests like the t-test applies to the sampling distribution of the mean, not necessarily to the raw data, but this distinction matters more in some situations than others. For heavily skewed data or data with extreme outliers, simply having a large sample doesn’t automatically make parametric tests safe.

Putting It Together

No single method is sufficient on its own. The most reliable approach uses multiple tools together. Start by plotting a histogram and a Q-Q plot to get a visual sense of your distribution. Then check your skewness and kurtosis values against the thresholds for your sample size. Finally, run a formal test, with Shapiro-Wilk as the default for most situations.

If all three methods agree, you can be confident in your conclusion. If they disagree, give more weight to the visual methods and skewness/kurtosis values for large samples, and lean more on the formal test for moderate samples (roughly 50 to 300). For very small samples under 30, recognize that none of these methods work particularly well. You may need to use nonparametric alternatives to your planned statistical test, or proceed with caution and report your normality assessment transparently.