What Is Path Analysis and How Does It Work?

Path analysis is a statistical method that tests how a set of variables are causally connected to each other. Instead of asking whether two things are related, path analysis maps out a proposed chain of cause and effect, then checks whether the data support that chain. It lets you separate the direct influence one variable has on another from the indirect influence that travels through intermediate variables along the way.

How Path Analysis Works

The basic idea starts with a diagram. You draw boxes for each variable and arrows showing which ones you believe cause changes in which others. Those arrows are the “paths,” and each one gets a numerical weight, called a path coefficient, that represents the strength of that relationship. A path coefficient close to zero means the connection is weak; a larger value means changes in one variable meaningfully predict changes in the next.

Variables in a path model fall into two categories. Exogenous variables are starting points: nothing else in your model explains why they vary. Think of them as the upstream causes you’re taking as givens. Endogenous variables, by contrast, are the ones your model is trying to explain. Their variation is at least partly determined by other variables in the diagram. A single model can have several of each, and an endogenous variable can serve as both an outcome of one path and a cause along another.

Direct, Indirect, and Total Effects

One of the most useful features of path analysis is its ability to break apart how one variable influences another. Suppose you want to know how high school achievement affects college graduation. Some of that influence is direct: students with stronger high school performance graduate college at higher rates regardless of anything in between. But some of it is indirect, flowing through a middle step like college GPA or choice of major.

Path analysis quantifies all three layers. The direct effect is the arrow going straight from one variable to the outcome. The indirect effect is the product of the arrows along any route that passes through one or more intermediate variables. The total effect is simply the sum of the direct and all indirect effects. In one UCLA teaching example, the total effect of high school achievement on college graduation was 0.487, with a direct effect of 0.372 and an indirect effect of 0.114. That tells you roughly a quarter of the total influence traveled through intermediate variables rather than arriving directly.

How It Differs From Regression

Standard regression tells you how strongly a set of predictors relates to a single outcome. Path analysis extends that logic to an entire network of relationships at once. You can model a variable that is both an outcome of one predictor and a cause of another, something a single regression equation can’t handle cleanly. This makes path analysis especially suited to questions where you suspect a causal chain rather than a simple list of independent predictors.

That said, path analysis doesn’t prove causation on its own. It tests whether the pattern of relationships in your data is consistent with the causal story you proposed. If the data don’t fit the model, the theory needs revising. If they do fit, it means the theory is plausible, not that it’s the only explanation.

Path Analysis vs. Structural Equation Modeling

You’ll often see path analysis and structural equation modeling (SEM) mentioned together, and the terms are sometimes used interchangeably. The key distinction is what kinds of variables the model includes. Path analysis in its classic form works with observed, directly measured variables: test scores, income, hours of exercise, graduation rates. SEM adds a layer by incorporating latent variables, which are theoretical constructs you can’t measure directly (like “intelligence” or “job satisfaction”) but can estimate from a cluster of observable indicators.

In practice, path analysis is a subset of SEM. If your SEM model contains only observed variables and no latent constructs, it’s essentially a path analysis. Many researchers start with path analysis to test a causal framework and move to full SEM when they need to account for measurement error or model abstract concepts.

Evaluating Whether Your Model Fits

Once you specify a path model and run it, you need to check whether the proposed relationships actually match the data. Researchers rely on a set of fit indices, numerical scores that summarize how well the model reproduces the observed patterns.

Two of the most commonly reported are the CFI (Comparative Fit Index) and the RMSEA (Root Mean Square Error of Approximation). For the CFI, values range from 0 to 1, and a score of 0.95 or higher is widely considered good fit. An older convention accepted 0.90, but that threshold has largely been replaced. For RMSEA, lower is better: values below 0.06 indicate good fit, 0.08 is considered fair, and anything above 0.10 is poor. A widely cited set of guidelines recommends reporting at least one relative fit index (like CFI) alongside one absolute fit index (like RMSEA or the SRMR, where values below 0.08 signal acceptable fit). No single number tells the full story, so researchers typically report several indices together.

Where Path Analysis Gets Used

Path analysis shows up across a wide range of fields. In psychology and education, it’s a standard tool for modeling how variables like socioeconomic status, parenting style, and school environment combine to influence academic outcomes. In public health, researchers use it to trace how risk factors like diet, physical activity, and stress connect through intermediate biological markers to produce health outcomes.

Neuroscience offers a particularly vivid example. Researchers have applied path analysis to brain imaging data to map how signals travel between brain regions. In one study comparing people with schizophrenia to healthy controls, path analysis revealed that while many of the same brain regions were active in both groups, the routes signals took between those regions differed. In the schizophrenia group, certain connections within visual and subcortical networks were missing, and new, atypical connections appeared between regions that weren’t linked in the control group. The analysis identified specific disrupted links, such as between the thalamus and the caudate, where the paths existed in both groups but followed significantly different trajectories. This kind of work illustrates why path analysis is valuable: it doesn’t just flag that two groups differ, it pinpoints where in a network the differences emerge.

Software for Running Path Analysis

Several well-established software options exist for path analysis. Mplus is one of the most popular dedicated programs, known for its relatively simple syntax and support for advanced models. LISREL and AMOS are other proprietary options with long track records in the social sciences. On the free and open-source side, the lavaan package in R is widely used and handles both path analysis and full structural equation modeling. OpenMx is another R-based option. For researchers who prefer Mplus but want the data management flexibility of R, the MplusAutomation package bridges the two by letting you set up, run, and interpret Mplus models from within R.

If you’re learning path analysis for the first time, lavaan in R or AMOS (which has a visual interface for drawing path diagrams) are common starting points. The choice often comes down to what your department or field already uses, since the underlying statistical methods are the same across platforms.