A fitted model is a statistical or machine learning model whose parameters have been calculated from data. Before fitting, a model is just a general structure, like the equation y = βX + α, where the values of β and α are unknown. After fitting, those blanks are filled in with specific numbers derived from your dataset, producing something like y = 3X + 5. That concrete equation, ready to make predictions, is the fitted model.
The distinction matters because the same model structure can produce very different fitted models depending on the data you train it on. Understanding what fitting actually does, and what can go wrong, is central to working with any kind of predictive or explanatory model.
How a Model Gets Fitted
Fitting is the process of finding the parameter values that make a model’s predictions match observed data as closely as possible. In a simple linear regression, the parameters are the slope and the intercept. In a neural network, they’re the thousands (or millions) of connection weights between nodes. The concept is the same regardless of scale: adjust the parameters until the model’s output aligns with reality.
The most common fitting methods work by optimizing a mathematical objective. Ordinary least squares, used in most linear regression, finds the parameter values that minimize the total squared difference between predicted and actual values. Maximum likelihood estimation takes a slightly different angle: it finds the parameters that make the observed data most probable. In practice, for many standard regression problems, these two approaches produce identical results.
Maximum likelihood estimation is the most widely used technique in statistics. It treats the likelihood as a function of the model’s parameters, then searches for the parameter values that maximize it. The resulting values are called maximum likelihood estimates. In nearly all cases these estimates are consistent, meaning they converge on the true values as you collect more data.
What a Fitted Model Contains
Once fitting is complete, the model stores a set of learned values. For a regression model, these are coefficients (one per predictor variable) and an intercept. If you fit a regression predicting word recall from two factors, you might get output like this: a coefficient of -3 for one predictor, +2 for another, and an intercept of 11. Each coefficient tells you how much the predicted outcome changes when that predictor shifts by one unit, holding everything else constant. The intercept is the model’s baseline prediction when all predictors equal zero.
These numbers are the core of the fitted model. They encode everything the model “learned” from the training data. In more complex models like decision trees, the learned structure is a set of splitting rules. In neural networks, it’s a matrix of weights and biases. But the principle is always the same: fitting converts a flexible template into a fixed, usable tool.
Fitted Values vs. the Fitted Model
A related term you’ll encounter is “fitted values.” These are simply the predictions the fitted model produces for the data it was trained on. If your fitted equation is y = 3X + 5 and one of your training data points has X = 5, the fitted value for that point is 20. Fitted values are useful for diagnosing problems. By comparing them to the actual observed values, you can see where the model gets things right and where it struggles. The gaps between fitted and observed values are called residuals, and patterns in those residuals often reveal whether the model structure is appropriate for the data.
Parameters vs. Hyperparameters
One common point of confusion is the difference between parameters and hyperparameters. Parameters are the values the fitting process determines automatically from data, like regression coefficients or neural network weights. Hyperparameters are settings you choose before fitting begins. They control the structure of the model or how the optimization runs. Examples include the number of trees in a random forest, the degree of a polynomial regression, or the learning rate in a neural network.
Adjusting hyperparameters is called tuning, not fitting. Tuning defines the playing field. Fitting is what happens on it. You might try several hyperparameter configurations, fit a model under each one, and then compare the results to find the best combination.
Overfitting and Underfitting
A well-fitted model captures the real patterns in your data without memorizing the noise. But things go wrong in two directions.
Overfitting happens when the model learns the training data too well, picking up on random fluctuations as if they were meaningful patterns. An overfitted model performs impressively on training data but poorly on new, unseen data. This is the core paradox of overfitting: the model contains more information about the training set but less useful information about anything else. Complex models with many parameters are especially prone to this, particularly when trained on small datasets.
Underfitting is the opposite problem. It occurs when the model is too simple to capture the actual structure in the data, or when too few predictors are included. An underfitted model does a poor job on both training data and new data because it never learned the real patterns in the first place. This can also happen when the training dataset is simply too small to reveal meaningful relationships.
The practical test is straightforward. A well-fitted model performs about equally well on training data and test data. An overfitted model shows a large gap, with high accuracy in training and noticeably lower accuracy on test data.
How to Tell If a Model Fits Well
Several metrics quantify goodness of fit. R-squared measures how much of the variation in your outcome the model explains, on a scale from 0 to 1. It’s intuitive but can be misleading because it always increases as you add more predictors, even useless ones.
Two information-based criteria handle this better by penalizing complexity. AIC (Akaike Information Criterion) balances how well the model fits against how many parameters it uses. BIC (Bayesian Information Criterion) does the same but applies a heavier penalty for additional parameters, especially with large datasets. Lower values indicate a better-fitting model for both metrics. When comparing multiple candidate models on the same data, AIC and BIC help you choose the one that fits well without unnecessary complexity.
For prediction-focused applications, metrics like root mean squared error (RMSE) are more common. RMSE tells you, in the same units as your outcome variable, how far off the model’s predictions typically are.
Validating a Fitted Model
Evaluating a model only on the data used to fit it is unreliable because the model has already seen those examples. The standard solution is to hold out a portion of your data for testing, or better yet, use cross-validation.
In k-fold cross-validation, the dataset is split into k equally sized groups. The model is trained on k-1 groups and tested on the remaining one. This rotates so every group serves as the test set exactly once. The results are averaged to produce a more stable estimate of how the model will perform on new data. This technique helps detect overfitting and provides a realistic picture of the model’s generalization ability. Best practices include proper randomization of the data before splitting, consistent feature scaling across folds, and reporting both mean performance and variability across folds.
Prediction vs. Inference
Fitted models serve two fundamentally different purposes, and the goal shapes how you build and evaluate them.
In inference, the point is to understand relationships. You want to know whether a specific variable has a real effect on the outcome, and how large that effect is. The fitted coefficients themselves are the product of interest, along with measures of uncertainty like confidence intervals and p-values. Statistical models built for inference prioritize interpretability and correct estimation of those parameters.
In prediction, the point is accuracy on new data. You care less about what the individual parameters mean and more about whether the model’s outputs are close to reality. Machine learning models built for prediction use techniques like cross-validation, early stopping, and hyperparameter tuning to optimize performance on unseen data. Instead of p-values, you assess them with metrics like RMSE or test set error.
These goals don’t always align. A model optimized for explaining relationships won’t necessarily make the best predictions, and a model tuned for predictive accuracy may contain coefficients that are difficult or impossible to interpret. Two models can share the same structure, like a linear regression, but one is built to explain a coefficient while the other is built to predict an outcome. Knowing which purpose you need determines how you fit, evaluate, and use the model.

