A discriminant function is a mathematical formula that takes measurements about something and produces a score used to classify it into one of two or more groups. If you measure several characteristics of an unknown specimen, patient, or data point, the discriminant function combines those measurements (each given a specific weight) into a single number that tells you which group the observation most likely belongs to. It’s one of the most widely used tools in statistics for sorting things into categories based on multiple variables at once.
How a Discriminant Function Works
At its core, a discriminant function is a weighted sum of your measured variables. Imagine you’re trying to classify an animal as Species A or Species B based on its body length, weight, and tail length. The discriminant function assigns each of those measurements a coefficient (a weight reflecting how important that variable is for telling the species apart), multiplies each measurement by its weight, adds them together, and produces a single score.
The general form looks like this: score = constant + (weight₁ × variable₁) + (weight₂ × variable₂) + … and so on for however many variables you have. If the score lands above a certain threshold, the observation is classified into Group 1. If it falls below, it goes into Group 2. For problems with more than two groups, separate functions are calculated, and the observation is assigned to whichever group gives the highest score.
The weights aren’t chosen arbitrarily. They’re calculated to maximize the separation between groups while minimizing the variation within each group. Think of it as finding the angle from which the groups look most distinct from each other. A variable that differs a lot between groups gets a larger weight; one that barely differs gets a smaller weight or may contribute almost nothing.
The Classification Rule
Behind the scenes, the classification decision follows a probability-based logic. For each possible group, the method calculates a posterior probability: the likelihood that the observation belongs to that group, given its measurements. The observation is then assigned to whichever group has the highest posterior probability.
This probability calculation takes into account two things: how well the observation’s measurements match the typical pattern of each group, and the prior probability of each group (essentially, how common each group is in the population). If 90% of your population belongs to Group A, an ambiguous observation gets a nudge toward Group A unless its measurements strongly suggest otherwise.
Linear vs. Quadratic Discriminant Functions
The most common version is the linear discriminant function, which draws straight-line boundaries between groups. It assumes that each group’s data follows a roughly bell-shaped (normal) distribution and, critically, that all groups share the same spread and correlation pattern among their variables. When these assumptions hold, the math simplifies to a straight boundary, and the method is both efficient and accurate.
When groups have noticeably different spreads or correlation patterns, a quadratic discriminant function is more appropriate. Instead of a straight boundary, it draws a curved one that can flex to accommodate the different shapes of each group’s data cloud. In practice, though, quadratic discriminant analysis requires estimating many more parameters, which means it needs more data to work well. With smaller datasets, it tends to overfit, performing beautifully on the training data but poorly on new observations. For this reason, the linear version is used far more often and tends to produce better accuracy across a wider range of real-world datasets.
Key Assumptions
For the results to be trustworthy, linear discriminant analysis relies on a few conditions. Each predictor variable should be roughly normally distributed. The variance-covariance matrices (the patterns of spread and correlation among variables) should be similar across groups. And the groups should ideally be of similar size. In practice, especially in clinical and biological research, these assumptions are frequently violated to some degree. Mild violations usually don’t cause major problems, but severe departures from normality or very unequal group variances can distort the classification boundaries and lead to unreliable results.
Interpreting the Coefficients
Once a discriminant function is built, the standardized coefficients tell you which variables contribute the most to separating the groups. These work much like standardized coefficients in regression: a larger absolute value means that variable plays a bigger role in distinguishing between groups. If your discriminant function for classifying tumors as benign or malignant gives a standardized coefficient of 0.75 to cell size but only 0.12 to patient age, cell size is doing most of the heavy lifting in the classification.
This makes discriminant analysis useful not just for prediction but for understanding your data. It reveals which measurements actually matter for telling groups apart and which ones you could drop without losing much accuracy.
Measuring How Well It Works
The overall significance of a discriminant function is commonly tested using a statistic called Wilks’ Lambda. This value ranges from 0 to 1. It compares how much variation exists between groups to the total variation in the data. A value close to zero means the groups are well separated by the function, which is what you want. A value close to one means the function isn’t doing much better than random guessing. If Wilks’ Lambda is small enough, you can conclude that the discriminant function is capturing real, statistically meaningful differences between groups.
Discriminant Functions vs. Similar Methods
Two methods often come up in comparison: logistic regression and principal component analysis.
Logistic regression solves a similar problem, predicting group membership from a set of variables. The key difference is in assumptions. Discriminant analysis assumes normally distributed predictors with equal covariance across groups. When those assumptions are met, discriminant analysis is actually more statistically efficient than logistic regression, meaning it extracts more information from the same amount of data. But logistic regression makes no such assumptions about the distribution of predictors, making it the safer choice when your data are clearly non-normal, when you have categorical predictors, or when you simply aren’t sure whether the normality assumption holds.
Principal component analysis (PCA) is a different animal entirely. PCA finds combinations of variables that capture the most overall variation in a dataset, with no knowledge of which observations belong to which group. It’s purely descriptive. A discriminant function, by contrast, uses group labels and specifically seeks combinations of variables that maximize the differences between groups while discounting variation that doesn’t help with classification. Two variables might show enormous overall variation (making them important in PCA) while being nearly identical across groups (making them useless in a discriminant function). This supervised quality, using labeled training data to learn what separates groups, is what makes discriminant analysis a classification tool rather than just a data-description tool.
Common Applications
Discriminant functions show up across a wide range of fields. In medicine, they help classify patients into diagnostic categories based on combinations of lab values or symptoms. In forensic anthropology, they’re used to estimate sex or ancestry from skeletal measurements. In ecology, researchers use them to classify species or habitats based on environmental variables. In marketing, they can predict which customer segment a person belongs to based on purchasing behavior.
The appeal in all these cases is the same: you have multiple measurements, you have known groups, and you want a principled, data-driven rule for assigning new observations to the right group. The discriminant function gives you that rule in a form that’s interpretable, efficient, and grounded in probability.

