F critical (often written F crit) is the threshold value in an ANOVA test that your calculated F statistic must exceed for you to conclude that at least one group mean is significantly different from the others. Think of it as a cutoff on a number line: if your F statistic lands to the right of F crit, you reject the null hypothesis that all group means are equal. If it falls to the left, you don’t have enough evidence to say the means differ.
How F Crit Works as a Decision Boundary
ANOVA compares how much variation exists between your groups to how much variation exists within them. That comparison produces a single number called the F statistic, calculated by dividing the mean square between groups by the mean square within groups. When the groups truly have the same population mean, this ratio hovers around 1, because the variation between groups is roughly the same as the variation within them. The further the F statistic climbs above 1, the stronger the evidence that something real is driving the group differences.
F crit tells you exactly how far above 1 the F statistic needs to go before you can call the result statistically significant. If your F statistic is larger than F crit, you reject the null hypothesis and conclude that not all group means are equal. If your F statistic is smaller than F crit, you accept that the observed differences could plausibly be due to chance.
Three Values That Determine F Crit
You need three pieces of information to look up or calculate F crit:
- Significance level (alpha): The probability of a false positive you’re willing to tolerate, most commonly set at 0.05 (5%).
- Numerator degrees of freedom (df1): The number of groups minus 1. If you’re comparing 4 groups, df1 = 3.
- Denominator degrees of freedom (df2): The total number of observations across all groups minus the number of groups. If you have 40 total observations across 4 groups, df2 = 36.
Change any one of these and F crit changes too. The most dramatic shifts come from changing the significance level. For a concrete example with 1 numerator and 1 denominator degree of freedom, F crit at the 0.10 level is about 39.9, at the 0.05 level it jumps to 161.4, and at the 0.01 level it soars to 4,052.2. A stricter alpha always raises the bar, making it harder to declare significance.
Why ANOVA Is Always a Right-Tailed Test
The F distribution is not symmetric like the bell curve you may be used to. It starts at zero, peaks early, and trails off to the right. Because ANOVA only cares whether the between-group variation is large relative to the within-group variation, the rejection region sits entirely in the right tail of this distribution. F crit marks where that rejection region begins. Every F value to its right falls inside the rejection region, and every F value to its left falls outside it.
This is different from a two-tailed t-test, where extreme values in either direction can be significant. In ANOVA, a very small F statistic (well below 1) simply means the groups look similar. Only large F values signal meaningful differences.
F Crit vs. P-Value
F crit and p-values are two ways of reaching the same conclusion. F crit divides the area under the F distribution curve into a rejection region and a non-rejection region. The p-value, on the other hand, tells you the exact probability of getting an F statistic as large as (or larger than) the one you actually observed, assuming the null hypothesis is true.
The two approaches always agree. When your F statistic exceeds F crit, the p-value will be less than your alpha level. When your F statistic falls short of F crit, the p-value will be greater than alpha. Most statistical software reports both, so you can use whichever feels more intuitive. The p-value gives you a precise probability, while F crit gives you a simple pass/fail comparison.
How to Find F Crit
Textbooks include printed F distribution tables, but these only cover common alpha levels (0.01, 0.05, 0.10) and a limited range of degrees of freedom. For anything beyond those, software is easier and more precise.
In Excel or Google Sheets, use the function F.INV.RT(probability, df1, df2). The “probability” argument is your alpha level. So for a test with alpha = 0.05, 3 numerator degrees of freedom, and 36 denominator degrees of freedom, you’d type =F.INV.RT(0.05, 3, 36). The function returns the exact F crit value. In R, the equivalent is qf(0.05, 3, 36, lower.tail = FALSE). Python users can call scipy.stats.f.ppf(0.95, 3, 36), flipping the probability because the function works from the left tail by default.
If you run a one-way ANOVA through Excel’s Data Analysis add-in, the output table automatically includes both the F statistic and F crit side by side, so you can compare them at a glance.
Degrees of Freedom Shape the Threshold
As your sample size grows, the denominator degrees of freedom increase and F crit drops. This makes intuitive sense: with more data, you need less dramatic group differences to be confident those differences are real. A study comparing 3 groups with 10 people each (df1 = 2, df2 = 27) has a higher F crit at alpha = 0.05 than the same study with 50 people per group (df1 = 2, df2 = 147).
The numerator degrees of freedom also matter. Comparing more groups raises df1, which shifts F crit in a way that accounts for the increased chance of finding a spurious difference when you make more comparisons simultaneously.
Assumptions That Must Hold
F crit is only a valid threshold when four assumptions are met. First, the observations must be independent of each other, which proper randomization in your study design handles. Second, the residuals (the gaps between each data point and its group mean) need to be approximately normally distributed. Third, the variance within each group should be roughly equal, a property called homogeneity of variance that can be checked with tests like Levene’s test. Fourth, the effects in your model should be additive, meaning the group effect and random error simply add together without interacting in unexpected ways.
Violations of these assumptions can make F crit unreliable. If group variances are very unequal, for instance, the true false-positive rate may be higher or lower than the alpha you chose. ANOVA is reasonably robust to mild violations of normality, especially with larger sample sizes, but severe departures call for alternative methods like the Welch ANOVA or nonparametric tests.
A Quick Walkthrough
Suppose you’re testing whether three different teaching methods produce different exam scores. You have 30 students total, 10 per group. Your numerator degrees of freedom are 3 minus 1, giving you 2. Your denominator degrees of freedom are 30 minus 3, giving you 27. At alpha = 0.05, you look up or compute F crit for (2, 27), which is approximately 3.35.
You run the ANOVA and get an F statistic of 4.82. Because 4.82 is greater than 3.35, you reject the null hypothesis. At least one teaching method produced a meaningfully different average score. The corresponding p-value would be below 0.05, confirming the same conclusion. If your F statistic had come out to, say, 2.10, it would fall short of the 3.35 threshold, and you’d conclude that the observed score differences are consistent with normal random variation.

