What Is a Regressor: Stats Definition and Types

A regressor is an input variable used to predict or explain an outcome in a statistical model. If you’re building a model to predict house prices, the regressors would be things like square footage, number of bedrooms, and neighborhood. The outcome you’re trying to predict (house price, in this case) is called the dependent variable or “regressand.” The regressors are the independent variables you feed into the model to make that prediction.

How Regressors Fit Into a Regression Equation

In its simplest form, a regression equation looks like this: Y = a + bX. Here, Y is the outcome you’re predicting, X is the regressor, “a” is a baseline value (the intercept), and “b” is the coefficient that tells you how much Y changes for every one-unit increase in X. If you were predicting someone’s weight from their height, height would be the regressor and weight would be the dependent variable. The coefficient “b” would tell you how many additional pounds to expect per additional inch of height.

Most real-world models use multiple regressors at once. A model predicting blood pressure might include age, weight, sodium intake, and exercise frequency as regressors. Each one gets its own coefficient, representing its individual contribution to the prediction while holding the other variables constant. This is called multiple regression, and it’s the workhorse of most applied statistics.

Different Names for the Same Thing

The term “regressor” comes from classical statistics, but the same concept goes by different names depending on the field. In statistics, you’ll hear “independent variable,” “explanatory variable,” or “predictor.” In machine learning, the standard term is “feature.” In econometrics, researchers sometimes call them “covariates.” These all refer to the same thing: an input variable used to model an outcome. If you’re reading across disciplines, knowing these synonyms will save you confusion.

Continuous vs. Categorical Regressors

Regressors can be numeric (continuous) or categorical. A continuous regressor like temperature or income can take any value along a range, and it enters the model directly. Categorical regressors, like race, marital status, or country, require an extra step because you can’t meaningfully multiply “married” by a coefficient.

The standard solution is dummy coding: converting a categorical variable into a set of binary (0 or 1) variables. If your categorical regressor is “race” with four categories (Hispanic, Asian, African American, and white), you’d create three new binary variables. Each one flags whether an observation belongs to that category. One category, called the reference level, is represented by all zeros. The coefficients for the other categories then represent the difference from that reference group. So if white is the reference category, the coefficient for “Hispanic” tells you how the outcome differs, on average, for Hispanic individuals compared to white individuals.

Choosing Which Regressors to Include

Not every available variable belongs in your model. Including too few regressors can leave out important explanations for the outcome, while including too many can make the model unreliable or hard to interpret. Several formal methods exist for selecting the right set of regressors.

Stepwise selection is one common approach: it adds or removes regressors one at a time, testing whether each one improves the model enough to justify its inclusion. The decision is typically guided by a scoring criterion. The Akaike information criterion (AIC) balances model fit against complexity, favoring models that explain the data well without unnecessary variables. The Bayesian information criterion (BIC) applies a stricter penalty, generally resulting in simpler models with fewer regressors. Different criteria can lead to different final models, so the choice depends on whether you prioritize predictive accuracy or interpretability.

When Regressors Overlap Too Much

A common problem in multiple regression is multicollinearity, which happens when two or more regressors are highly correlated with each other. If you include both “years of education” and “highest degree earned” as regressors, they carry largely redundant information. The model struggles to separate their individual effects because what one regressor explains about the outcome overlaps heavily with what the other explains.

The practical consequences are significant. Coefficient estimates become unstable, meaning small changes in the data can produce large swings in the estimated effect of each regressor. Standard errors inflate, making it harder to detect effects that genuinely exist. Coefficients can even flip to the wrong sign, suggesting a variable has the opposite effect from what’s actually true. The overall model might still predict well, but interpreting any single regressor’s contribution becomes unreliable.

The usual fix is to remove one of the overlapping regressors, combine correlated variables into a single composite, or use techniques specifically designed to handle collinearity.

Exogenous vs. Endogenous Regressors

In econometrics and structural modeling, regressors are sometimes classified as exogenous or endogenous. An exogenous regressor is determined outside the model. Its value is fixed and isn’t influenced by the other variables in the system. Think of a policy change, a natural disaster, or a randomly assigned treatment: these are external forces that affect outcomes but aren’t themselves shaped by those outcomes.

An endogenous regressor, by contrast, is influenced by other variables in the model. This creates a feedback loop that standard regression can’t handle cleanly. For example, if you’re modeling the effect of education on income, education is partly determined by family wealth, which is also related to income. That circular relationship makes education an endogenous regressor, and ignoring this produces misleading estimates. Specialized techniques exist to address this, but the key insight is that not all regressors are equally straightforward to work with. Whether a regressor is truly external to the system or tangled up in it changes how you need to build your model.

What Makes a Good Regressor

A useful regressor has a genuine, interpretable relationship with the outcome. It adds predictive information that the other regressors in the model don’t already capture. It’s measured accurately, since noisy or error-prone regressors weaken a model’s ability to detect real effects. And ideally, it’s not so tightly correlated with other regressors that its individual contribution becomes impossible to untangle.

In practice, choosing regressors is part science and part judgment. Domain knowledge matters as much as statistical criteria. A model predicting hospital readmission rates will perform better if you include regressors that clinicians know to be relevant, rather than blindly throwing in every available data point and hoping the algorithm sorts it out.