How to Report a Mann-Whitney U Test in APA Style

Reporting a Mann-Whitney U test requires six core components: the test statistic (U), the sample sizes of both groups, descriptive statistics (typically medians and interquartile ranges), a Z-score for larger samples, an exact p-value, and an effect size. Getting all of these into your results section, formatted correctly, is what separates a publishable write-up from one that gets flagged in peer review.

Essential Values to Include

Every Mann-Whitney U report needs these pieces:

Sample sizes for each group (n₁ and n₂)
Descriptive statistics for each group, usually medians and interquartile ranges
The U statistic
The Z-score (standardized test statistic)
The exact p-value
An effect size, most commonly r

The U statistic itself represents a comparison of ranks between your two groups. Values near zero suggest the groups barely differ, while values approaching the product of the two sample sizes (n₁ × n₂) suggest strong separation. On its own, U is hard to interpret without context, which is why you always pair it with a p-value and effect size.

APA Formatting Rules

In APA style, statistical symbols like U, Z, p, M, and n are italicized. You do not need to define these abbreviations in your paper because they are considered standard statistical notation. Report the U statistic and Z-score to two decimal places. Report exact p-values to two or three decimal places (for example, p = .03 or p = .006), with one exception: when the p-value falls below .001, write p < .001 rather than reporting the full string of zeros.

Use a zero before the decimal point only for numbers that can exceed 1.0. Since p-values, effect sizes, and correlations are capped at 1.0, they never get a leading zero. Write p = .04, not p = 0.04. The U statistic, however, can exceed 1.0, so it does get a leading zero when applicable (though in practice U values are rarely that small).

Why Medians and Interquartile Ranges Matter

The Mann-Whitney U is a rank-based test, so reporting means and standard deviations alongside it is a mismatch. Medians and interquartile ranges (IQRs) are the appropriate descriptive statistics because they reflect the same non-parametric logic the test uses.

There is one important nuance here. When the distributions of your two groups have similar shapes, you can describe the Mann-Whitney as a test of medians and report the group medians directly. When the distributions differ in shape or spread, the test is really comparing mean ranks, not medians, and your write-up should reflect that. A 2001 paper in the BMJ made this point explicitly: differences in spread between groups can be clinically important, and collapsing everything into a median comparison can hide that information. Look at box plots or histograms of both groups before deciding how to frame your results.

A clean way to report descriptive statistics in-text looks like this: “median survival was 16 days (IQR: 5 to 42.5) in the treatment group and 8 days (IQR: 3 to 18) in the control group.”

Calculating and Reporting Effect Size

The most commonly reported effect size for the Mann-Whitney U is r, calculated by dividing the Z-score by the square root of the total sample size:

r = Z / √N

Here, N is the combined sample size of both groups. Most statistical software gives you the Z-score directly, so the calculation takes seconds. Interpret the absolute value of r using Cohen’s thresholds: 0.1 or above is a small effect, 0.3 or above is medium, and 0.5 or above is large. Report it to two decimal places, and state the interpretation in plain language so readers can gauge practical significance alongside statistical significance.

Full Reporting Templates

When Distributions Have Similar Shapes

When both groups share a similar distribution shape, you can compare medians directly. Here is how a significant result might read:

“Participants in the intervention group reported higher satisfaction scores (Mdn = 4.50, IQR: 3.20 to 5.80) than participants in the control group (Mdn = 3.10, IQR: 2.00 to 4.60). A Mann-Whitney U test indicated that this difference was statistically significant, U = 210.50, Z = -2.45, p = .014, r = .35.”

For a non-significant result with similar distributions:

“Median engagement scores for females (Mdn = 5.38) and males (Mdn = 5.58) were not statistically significantly different, U = 145, Z = -1.49, p = .142, r = .22.”

When Distributions Differ in Shape

If the two groups have differently shaped distributions, report mean ranks instead of medians:

“The distributions of engagement scores for the two groups were not similar in shape. Engagement scores for females (mean rank = 17.75) and males (mean rank = 23.25) were not statistically significantly different, U = 145, Z = -1.49, p = .142, r = .22.”

This distinction matters because two groups can have identical medians but very different rank distributions. Reporting mean ranks in that scenario gives readers an honest picture of the data.

Structuring Your Results Section

A complete write-up typically flows in three beats. First, state the descriptive statistics for each group so the reader can see the raw difference. Second, report the inferential test with U, Z, and p. Third, provide the effect size and interpret it. You can do all of this in two to three sentences.

Before the inferential results, briefly note whether the distributions of the two groups were similar in shape. One sentence is enough: “Inspection of the distributions showed similar shapes for both groups” or “The two groups differed in distributional shape.” This tells the reader why you chose to report medians or mean ranks, and it shows reviewers you checked an important assumption.

If you are reporting multiple Mann-Whitney tests (comparing several outcome variables between the same two groups), a table is more efficient than repeating the same sentence structure. In tables, include columns for each group’s median (or mean rank), U, Z, p, and r. Report exact p-values in the table, using “< .001” for anything below that threshold.

Visualizing the Results

Box plots are the standard companion to a Mann-Whitney U test. They display the medians, interquartile ranges, and outliers for both groups in a single image, which maps directly onto the statistics you report in the text. Label each box with the group name, and consider adding individual data points overlaid on the boxes if your sample sizes are small enough to make them readable.

Violin plots work well when you want to emphasize differences in distribution shape between the groups, which is particularly relevant given that the Mann-Whitney can detect spread differences as well as location shifts. If you noted that your two groups had differently shaped distributions, a violin plot communicates that visually in a way a box plot alone cannot.

Common Mistakes to Avoid

The most frequent error is reporting means and standard deviations alongside a Mann-Whitney U. If your data met the assumptions for comparing means, you would have used a t-test. Stick with medians and IQRs, or mean ranks when distribution shapes differ.

Another common mistake is omitting the effect size. A p-value tells you whether a difference is statistically significant, but with large enough samples, tiny differences become significant. The effect size r tells your reader whether the difference is actually meaningful. Many journals now require it, and reviewers will flag its absence.

Finally, avoid reporting only the U statistic without the Z-score. The Z-score is what allows readers to evaluate the result on a standardized scale, and it is what you need to calculate the effect size. Most statistical software outputs both, so there is no reason to leave it out.