How to Interpret Multinomial Logistic Regression Results

Multinomial logistic regression predicts an outcome variable that has three or more unordered categories, like type of transportation (car, bus, bike) or choice of academic program (vocational, general, academic). Interpreting the output requires understanding one key idea: every coefficient describes the relationship between a predictor and the odds of one outcome category compared to a single reference category. Once that clicks, the rest follows naturally.

The Reference Category Is Everything

Unlike regular logistic regression, which compares two outcomes (yes vs. no), multinomial logistic regression compares each outcome category against one designated “reference” or “baseline” category. Your software picks this automatically, usually the first or last category, but you can change it. Every coefficient in your output is a comparison to that baseline.

If your outcome has three categories (say academic, general, and vocational programs) and vocational is the reference, the model actually estimates two separate equations: one comparing general to vocational, and another comparing academic to vocational. Each equation has its own set of coefficients. This means a single predictor like socioeconomic status will appear twice in your output, once for each comparison, and the coefficients can be completely different in size and direction.

Changing the reference category changes every coefficient in the output, even though the underlying model is identical. If you need a comparison that isn’t in your current output (say, academic vs. general), you can rerun the model with a different reference category. The choice of reference doesn’t affect the model’s predictions or overall fit, only how the results are displayed.

What the Coefficients Mean

Each coefficient (often labeled “B” or “Estimate” in software output) is a log-odds value. Specifically, it represents the change in the log of the ratio between the probability of a given outcome and the probability of the reference outcome, for a one-unit increase in that predictor. That’s a mouthful, so here’s a concrete example.

In UCLA’s analysis of student program choice, a one-unit increase in writing score was associated with a 0.058 increase in the log-odds of being in the general program versus the vocational program, and a 0.114 increase in the log-odds of being in the academic program versus the vocational program. The writing score pushed students toward academic programs more strongly than toward general programs, relative to vocational.

For categorical predictors, the logic is similar but the comparison is between groups rather than units. Moving from high socioeconomic status to middle socioeconomic status decreased the log-odds of being in the general program versus vocational by 0.645. In plain terms: middle-SES students were less likely than high-SES students to be in the general program compared to vocational.

Converting to Odds Ratios

Raw log-odds coefficients are hard to interpret intuitively. Most researchers exponentiate them (raise e to the power of the coefficient) to get odds ratios, which are easier to explain. If a coefficient is 0.114, the odds ratio is e^0.114 = 1.12, meaning a one-unit increase in that predictor multiplies the odds of that outcome (vs. the reference) by 1.12, or increases them by about 12%.

An odds ratio above 1 means the predictor increases the odds of that outcome relative to the reference. Below 1 means it decreases them. Exactly 1 means no effect. Many statistical packages will output exponentiated coefficients directly if you ask for them (often labeled “Exp(B)”).

One important caveat: odds ratios are always conditional on the other variables in the model. An odds ratio of 1.5 for smoking would change if you added or removed other predictors like income or age. Comparing odds ratios across different models or different datasets is not straightforward.

Reading the Output Table

A typical output table from SPSS, R, or Stata will show results grouped by outcome category (excluding the reference). Within each group, you’ll see a row for each predictor. The key columns are:

  • B or Estimate: The log-odds coefficient described above.
  • Standard Error: How precisely the coefficient is estimated. Smaller is better.
  • Wald statistic: A chi-square test of whether that individual coefficient is significantly different from zero.
  • p-value (Sig.): The probability of seeing a coefficient this large if the true effect were zero. Values below 0.05 are conventionally considered statistically significant.
  • Exp(B): The exponentiated coefficient, or odds ratio.

Because each predictor appears in multiple equations (one per non-reference category), you might find that a variable is significant in one comparison but not another. A predictor could strongly distinguish academic from vocational students while having no effect on the general vs. vocational comparison. This is normal and actually one of the strengths of the model: it captures different dynamics across outcome categories.

Testing Whether a Predictor Matters Overall

The individual Wald tests tell you whether a predictor is significant for one specific comparison. But if you want to know whether a predictor matters for the outcome overall, across all categories simultaneously, you need a likelihood ratio test.

This test compares your full model (with the predictor included) to a reduced model (without it). A significant result means the predictor improves the model’s ability to distinguish between outcome categories taken as a whole. In one example analysis, socioeconomic status had a likelihood ratio chi-square of 12.917 with a p-value of .012, indicating it significantly predicted program choice overall, even though its effect varied across individual comparisons.

Most software also reports an overall model likelihood ratio test that compares your entire model to a null model with no predictors at all. A significant result here (p < 0.05) tells you the model as a whole is doing better than random guessing. In the same analysis, the overall chi-square was 74.29 with p < 0.001.

Assessing Model Fit

Multinomial logistic regression doesn’t produce a traditional R-squared value like linear regression. Instead, you’ll typically see pseudo R-squared measures that approximate how much variation the model explains. These tend to be lower than what you’d see in linear regression, so don’t panic if they look modest.

When comparing two competing models (say, one with three predictors and one with five), AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are your best tools. Lower values indicate better fit. These metrics penalize model complexity, so they help you avoid overfitting by adding too many predictors. If two models produce AIC values of 366.9 and 316.0, the second model fits meaningfully better.

For grouped data, a deviance statistic (G²) can test whether your model fits significantly worse than a saturated model that perfectly reproduces the observed data. A non-significant result is good news: it means your simpler model fits adequately.

Using Predicted Probabilities

Log-odds and odds ratios are useful for statistical inference, but predicted probabilities are what most people actually find intuitive. For any combination of predictor values, the model can output the predicted probability of falling into each outcome category. These probabilities always sum to 1 across categories.

For instance, a student with certain characteristics might have predicted probabilities of 0.15 for academic, 0.34 for general, and 0.51 for vocational. You can immediately see which outcome is most likely and by how much. Generating these probabilities for different predictor values (say, low vs. high socioeconomic status while holding writing scores at their average) gives you a clear picture of how each predictor shifts the distribution of outcomes.

This approach is especially effective for communicating results to non-technical audiences. Rather than saying “the log-odds coefficient is 0.114,” you can show that increasing writing scores from 40 to 60 shifts the predicted probability of the academic program from 20% to 45%, which is immediately meaningful.

The Independence Assumption to Watch For

Multinomial logistic regression carries an assumption called the independence of irrelevant alternatives (IIA). It states that the relative probability of choosing between any two options should not change if a third option is added or removed. The classic illustration: if someone is equally likely to take a red bus or a car, adding a blue bus shouldn’t change their relative preference between the red bus and the car. But in practice it might, because the blue bus draws riders away from the red bus more than from the car.

If IIA is violated, your model’s estimates may be biased. Several chi-square based tests exist to check for IIA violations, though they can be sensitive to sample size and aren’t always conclusive. If you have strong reason to believe your outcome categories are substitutes for each other in ways that violate IIA, alternative models like nested logit or mixed logit may be more appropriate.

Multinomial vs. Ordinal Logistic Regression

If your outcome categories have a natural order (like mild, moderate, severe), ordinal logistic regression is typically the better choice because it uses that ordering information, giving you more statistical power with fewer parameters. Multinomial logistic regression treats all categories as unordered, so it ignores ranking entirely.

That said, ordinal regression requires its own assumption: that each predictor has the same effect across all levels of the outcome (the parallel regression or proportional odds assumption). If your data violate that assumption, multinomial logistic regression is a valid fallback, even for ordered outcomes. You lose some efficiency but gain flexibility in letting each predictor have different effects for different category comparisons.