What Is the Sign Test? Uses, Assumptions & Limits

The sign test is the simplest nonparametric statistical test, used to determine whether a sample’s median differs from a specific value or whether one condition tends to produce higher results than another. Unlike more common tests such as the t-test, it doesn’t require your data to follow a normal distribution. It works by reducing each data point to a plus or minus sign, then counting which sign appears more often.

How the Sign Test Works

The core logic is straightforward. You start with a reference value, often called the hypothesized median. For each observation in your data, you ask one question: is this value above or below the reference? If it’s above, it gets a “+” sign. If it’s below, it gets a “−” sign. Values exactly equal to the reference are typically dropped from the analysis.

Once every observation has a sign, you count the number of plus signs. This count is your test statistic, sometimes called B (the “sign statistic”). If the true median really does equal your reference value, you’d expect roughly half your observations to fall above and half below, meaning plus and minus signs should appear in roughly equal numbers. If the count of plus signs is unusually high or unusually low, that’s evidence the median is different from what you hypothesized.

Because the test boils down to counting successes out of trials with a 50/50 probability, the math behind it follows a binomial distribution. For larger samples (generally when the sample size multiplied by 0.5 is at least 10), you can use a normal approximation instead of calculating exact binomial probabilities.

One-Sample and Paired-Sample Versions

The sign test comes in two flavors, though they use the same underlying logic.

In the one-sample version, you’re testing whether the median of a single population equals some specific value. For example, if you suspect the median commute time in a city is 30 minutes, you collect commute times from a random sample, mark each one as “+” (above 30) or “−” (below 30), and see whether the split is lopsided enough to reject 30 minutes as the true median.

In the paired-sample version, you compare two related measurements on the same subjects, such as blood pressure before and after a treatment. For each subject, you subtract the “before” score from the “after” score. If the difference is positive, it gets a “+”; if negative, a “−.” You then test whether the number of positive differences is significantly different from what you’d expect by chance. This version is common in clinical research, where one group of patients receives two treatments and the question is whether one treatment tends to produce better outcomes.

What the Sign Test Assumes

The sign test has remarkably few assumptions, which is its biggest selling point. Your observations (or pairs) need to be independent of one another, and your variable should be continuous so that ties with the hypothesized median are unlikely. That’s essentially it. There’s no requirement that the data be normally distributed, no requirement for a specific shape of distribution, and no requirement that the data be measured on an interval or ratio scale. Ordinal data works fine as long as you can determine whether each value falls above or below the reference point.

If the distribution of your data happens to be symmetric, the sign test also functions as a valid test for the mean, since the mean and median are equal in symmetric distributions.

The Null and Alternative Hypotheses

The null hypothesis states that the population median equals some specified value (m = m₀). The alternative hypothesis depends on your research question. In a two-tailed test, the alternative is simply that the median does not equal m₀. In a one-tailed test, the alternative is that the median is either greater than or less than m₀.

For paired data, the null hypothesis states that neither condition is systematically favored, meaning the median of the paired differences is zero. A significant result tells you that one condition tends to produce higher values than the other. Crucially, it tells you the direction of the difference but nothing about the size of the difference.

Strengths of the Sign Test

The sign test’s main advantage is that it works in situations where other tests don’t. When your data are heavily skewed, when you have outliers that would distort a t-test, or when your sample is too small to verify normality, the sign test remains valid. It’s also useful for ordinal data where exact magnitudes aren’t meaningful, like patient satisfaction ratings or pain severity rankings.

Its simplicity makes it nearly impossible to misapply. You don’t need software to run it on a small dataset. Count the pluses, count the minuses, and look up the binomial probability.

Limitations and Statistical Power

The sign test’s simplicity comes at a real cost. By reducing every observation to a simple “+” or “−,” it throws away information about how far each observation falls from the reference value. An observation that’s 0.1 above the median and one that’s 100 above the median both receive the same “+” sign. This makes the test less efficient and reduces its statistical power, meaning it has a lower chance of detecting a real effect compared to tests that use the full data.

This power disadvantage is especially pronounced with small samples. Among nonparametric tests, the sign test is the least powerful, and nonparametric methods in general have less power than parametric ones like the t-test when the parametric assumptions are met. In modern clinical trials, the sign test is not typically the analytical method of choice for this reason.

Tied observations also create practical problems. When designing a study that will use the sign test, ignoring the possibility that tied observations will need to be discarded can lead to an underpowered study with too few usable data points.

Sign Test vs. Wilcoxon Signed-Rank Test

The Wilcoxon signed-rank test is the most common alternative to the sign test, and the two are frequently compared. Both are nonparametric and both work with paired or single-sample data. The key difference is that the Wilcoxon test accounts for magnitude. It doesn’t just record whether each difference is positive or negative; it also ranks the differences by their absolute size, giving more weight to larger differences.

This makes the Wilcoxon test more powerful in most situations. However, it adds an assumption: the distribution of differences should be roughly symmetric. The sign test makes no such assumption. So if your paired differences are heavily skewed or you’re working with purely ordinal data, the sign test is the safer choice. If the differences are reasonably symmetric and you want more power to detect a real effect, the Wilcoxon test is generally preferred.

When To Use the Sign Test

Small samples with unknown distributions: When you can’t verify normality and don’t want to assume it.
Ordinal data: When you can rank observations as above or below a value but can’t meaningfully measure distances between them.
Data with extreme outliers: Since the test ignores magnitude, outliers have no influence on the result.
Quick preliminary analysis: When you need a fast, assumption-free check before applying more sophisticated methods.
Paired comparisons in clinical settings: When testing whether one treatment is generally favored over another across matched pairs of patients.

The sign test fills a specific niche: it sacrifices precision for robustness. When your data meet the assumptions of a t-test or Wilcoxon test, those methods will give you more power to detect real effects. But when those assumptions are questionable, the sign test provides a reliable fallback that’s hard to get wrong.