What Is a Logit in Statistics and Machine Learning?

A logit is the natural logarithm of the odds of something happening. If you have a probability (say, a 75% chance of rain), the logit converts it into a number that can range from negative infinity to positive infinity. This transformation is useful because probabilities are stuck between 0 and 1, which creates problems when you try to use them in standard mathematical equations. The logit removes that constraint.

The term was coined by statistician Joseph Berkson in 1944, short for “logistic unit.” He proposed it as a simpler alternative to a similar tool called the probit, and it has since become one of the most widely used transformations in statistics, medicine, and machine learning.

The Core Idea: Turning Probability Into Odds, Then Taking the Log

The logit function works in two steps. First, it converts a probability into odds. If the probability of an event is p, the odds are p divided by (1 minus p). A 75% probability becomes 0.75 / 0.25, which equals odds of 3 to 1. Second, it takes the natural logarithm of those odds. So the logit of 0.75 is ln(3), which is about 1.10.

Written as a formula: logit(p) = log(p / (1 – p)).

A few reference points make this easier to feel intuitively. When the probability is exactly 50/50, the odds are 1 to 1, and the log of 1 is zero. So a logit of zero means even odds. Probabilities above 50% produce positive logits, and probabilities below 50% produce negative logits. As the probability approaches 100%, the logit climbs toward positive infinity. As it approaches 0%, the logit drops toward negative infinity.

Why the Logit Exists

The logit solves a specific mathematical problem. Many statistical techniques, especially regression, work by fitting straight lines (or planes, in higher dimensions) to data. The output of a straight-line equation can be any number: 2.5, negative 40, a thousand. But when you’re trying to predict something binary, like whether a patient survives surgery or whether a customer clicks an ad, the answer you need is a probability between 0 and 1.

You can’t directly set a straight-line equation equal to a probability, because the line could easily produce values below 0 or above 1, which don’t make sense as probabilities. The logit function bridges this gap. It takes values locked in the 0-to-1 range and stretches them across the entire number line. This lets the math work both ways: you can run a standard linear equation, then convert the result back into a meaningful probability.

The Logit and the Sigmoid Are Inverses

If you’ve encountered the sigmoid (or logistic) function, you already know the logit’s mirror image. The sigmoid takes any number and squashes it into the 0-to-1 range using the formula: 1 / (1 + e^(-x)). The logit does the exact opposite: it takes a value between 0 and 1 and maps it back to any real number. Applying the logit and then the sigmoid (or vice versa) gets you right back where you started.

This inverse relationship is why the logit is sometimes called the “inverse logistic function” or the “log-odds function.” They’re all describing the same thing.

How Logits Are Used in Logistic Regression

Logistic regression is the most common statistical application of the logit. In this model, you’re predicting a yes-or-no outcome (disease or no disease, purchase or no purchase) based on a set of input variables. The model fits a linear equation to those inputs, and the output of that equation is a logit: a log-odds value. To get a predicted probability, you run the logit through the sigmoid function.

The logit serves as what statisticians call a “link function.” It links the probability you care about to the linear equation the model can actually fit. Without it, logistic regression wouldn’t work.

Each coefficient in a logistic regression model represents the change in the log-odds of the outcome for a one-unit increase in that predictor. To make this more interpretable, you can exponentiate the coefficient (raise e to its power) to get an odds ratio. For example, if a coefficient is 0.7, the odds ratio is e^0.7, or about 2.0, meaning a one-unit increase in that variable roughly doubles the odds of the outcome.

Clinical risk scores used in hospitals are frequently built this way. Tools that predict outcomes like 30-day readmission or early mortality after discharge typically start as logistic regression models, where the internal math runs entirely on logits. The final scores patients and doctors see are simplified versions of those logit-based calculations.

Logits in Machine Learning and AI

In deep learning, the word “logits” has taken on a slightly broader meaning. It refers to the raw, unnormalized output scores from the final layer of a neural network, before those scores are converted into probabilities. If a neural network is classifying an image as “cat,” “dog,” or “bird,” its last layer might output three numbers like [2.1, 0.4, -1.3]. Those are the logits.

These raw scores don’t sum to 1 and aren’t bounded between 0 and 1, so they aren’t probabilities yet. To turn them into probabilities, a function called softmax (a generalization of the sigmoid for multiple categories) is applied. The logits contain the same information as the final probabilities, just in an unconverted form. Many operations in training and fine-tuning neural networks are performed directly on the logits rather than the probabilities, because the math is more stable and efficient that way.

How Logit Compares to Probit

The logit isn’t the only function that maps probabilities to the full number line. The probit does something similar, but instead of using the log of the odds, it uses the inverse of the standard normal distribution. In practical terms, both approaches produce very similar results for most datasets. Logit models assume the underlying errors follow a logistic distribution, while probit models assume a normal (bell curve) distribution.

Berkson originally promoted the logit as a computationally simpler alternative to the probit, which had been the standard tool in fields like toxicology and bioassay. That computational advantage mattered enormously in the 1940s and still gives the logit a slight edge in interpretability: logit coefficients translate directly into log-odds and odds ratios, which many researchers find more intuitive than probit coefficients. Today, logistic regression (using the logit) is far more common than probit regression in most fields, though probit remains popular in economics.