What Is Factor Analysis and How Does It Work?

Factor analysis is a statistical method that takes a large set of variables and identifies the smaller number of underlying patterns, called factors, that explain why those variables are related to each other. If you’ve ever filled out a 30-question survey and seen your results summarized into three or four scores, factor analysis is likely the technique behind that simplification. It works by examining the correlations between all the variables and finding clusters of items that move together, suggesting they share a common hidden cause.

What Factor Analysis Actually Does

Imagine you give 500 people a questionnaire with 20 questions about their mood. Some questions ask about sadness, others about sleep, others about appetite, and others about motivation. You’d expect certain answers to cluster together: people who report feeling sad also tend to report low motivation. Factor analysis detects these clusters mathematically by analyzing the full matrix of correlations between every pair of questions. It then identifies a smaller number of dimensions, or factors, that account for those patterns. In this example, it might reveal two factors: one capturing “emotional distress” and another capturing “physical symptoms.”

The core goal is data reduction without losing meaningful information. Instead of interpreting 20 separate responses, you can work with two or three factor scores that capture the essence of the original data. This makes the results easier to analyze, easier to visualize, and easier to use in further research.

Exploratory vs. Confirmatory Factor Analysis

There are two broad types, and they serve opposite purposes.

Exploratory factor analysis (EFA) is used when you don’t yet know how your variables group together. You’re essentially asking the data to reveal its own structure. This is common early in research, especially when developing a new survey or questionnaire. You collect responses and let the analysis show you how many factors exist and which items belong to each one.

Confirmatory factor analysis (CFA) is used when you already have a theory about the structure and want to test whether the data actually fits it. If previous research suggested your mood questionnaire should have exactly two factors, CFA checks that hypothesis against a new dataset. A poor fit means your proposed structure doesn’t hold up, and the instrument may need revision. Researchers often use EFA first on one sample, then CFA on a separate sample to verify the results.

How to Read the Output

Factor analysis produces several key numbers that tell you whether it worked and what it found.

Factor loadings are the most important. A loading is simply the correlation between an individual variable (like a survey question) and the underlying factor. A loading of 0.70 means that variable is strongly tied to that factor. A loading of 0.15 means the connection is weak. Most researchers treat loadings of 0.40 or above as meaningful, though some use a stricter cutoff of 0.50 or a more lenient one of 0.30, depending on the context.

Eigenvalues tell you how much total variance a single factor explains across all the variables. If you have eight items and the first factor has an eigenvalue of 3.06, that factor explains about 38% of the total variance (3.06 divided by 8). The second factor might explain only 6%. Higher eigenvalues mean more explanatory power.

Variance explained is the practical bottom line. You can square any individual loading to see how much of that specific variable’s variance the factor accounts for. A loading of 0.66, for instance, means the factor explains about 43% of that item’s variance. When evaluating the overall model, researchers look at the cumulative variance explained by all retained factors. A solution explaining 50% to 60% of total variance is generally considered adequate in social sciences.

Deciding How Many Factors to Keep

One of the trickiest decisions in factor analysis is choosing the right number of factors. Too few and you lose important distinctions in the data. Too many and you’re modeling noise.

The most widely used rule is the Kaiser-Guttman criterion: keep every factor with an eigenvalue greater than 1.0. The logic is that a factor should explain at least as much variance as a single original variable would on its own. This rule is simple but sometimes retains too many or too few factors, so researchers rarely rely on it alone.

The scree plot offers a visual alternative. It graphs eigenvalues on the vertical axis and factor number on the horizontal axis. You look for the “elbow,” the point where the line flattens out. Factors before the elbow are considered meaningful, while factors after it represent error or trivial variation. The downside is that reading a scree plot involves subjective judgment, and reasonable people can disagree on where the elbow falls. In practice, most researchers use both methods together and also consider whether the resulting factors make conceptual sense.

What Rotation Does and Why It Matters

The initial factor solution that comes out of the math is often hard to interpret because variables load moderately on multiple factors rather than loading strongly on just one. Rotation is a technique that adjusts the factors to produce a cleaner, more interpretable pattern without changing the overall amount of variance explained.

Orthogonal rotation, most commonly a method called varimax, keeps the factors completely uncorrelated with each other. Varimax maximizes the spread of loadings so that each variable loads high on one factor and low on the others. This produces the cleanest, simplest structure and is the most popular rotation method by a wide margin.

Oblique rotation, such as promax, allows the factors to be correlated. This is more realistic in many real-world situations. Anxiety and depression, for example, are distinct constructs but clearly related. If you force them to be uncorrelated through orthogonal rotation, you may distort the true picture. Promax works by taking the varimax solution and then raising the loadings to a power (typically between 2 and 4) to push them toward a simpler bipolar structure, then relaxing the requirement that factors stay independent.

Choosing between them depends on your subject matter. If you have strong reason to believe the underlying factors are independent, use varimax. If the factors are likely related, as they often are in psychology and health research, oblique rotation gives a more honest result.

Checking Whether Your Data Is Suitable

Not every dataset is appropriate for factor analysis. Before running it, two preliminary tests help you decide whether to proceed.

The Kaiser-Meyer-Olkin (KMO) measure assesses whether your variables share enough common variance to make factor analysis worthwhile. KMO values range from 0 to 1, with anything above 0.60 considered acceptable. Values above 0.80 are generally rated as good, and above 0.90 as excellent. A low KMO means your variables don’t correlate enough with each other for factors to emerge.

Bartlett’s test of sphericity checks whether your correlation matrix is significantly different from an identity matrix, which is a matrix where no variables correlate at all. You want this test to be statistically significant (p less than 0.001). If it’s not, your variables are essentially independent and there are no underlying factors to find.

Factor analysis also assumes that relationships between variables are roughly linear, that the data comes from a reasonably large sample, and that observations are independent of each other. Regarding sample size, older rules of thumb suggest minimums of 100 to 200 participants or 5 to 10 observations per variable, but the actual requirement depends heavily on how strongly items load onto factors. With strong loadings of 0.80, a one-factor model with four indicators can work with as few as 60 participants. With weaker loadings of 0.50, that same model needs around 190. More complex models with multiple factors require substantially larger samples, sometimes 200 to 460 or more.

Where Factor Analysis Gets Used

Factor analysis shows up across a wide range of fields, but its most visible application is in developing and validating questionnaires and scales. In health research, it’s essential for confirming that a patient survey actually measures what it claims to measure. One well-known example involved the Coping Self-Efficacy scale, where researchers administered 26 items to participants in one clinical trial, used exploratory factor analysis to discover the factor structure, then tested that structure on an independent sample from a second trial using confirmatory factor analysis.

Sleep research in Parkinson’s disease offers another illustration. Researchers tested whether the Parkinson’s Disease Sleep Scale worked better as a single overall score, a three-dimensional scale (measuring insomnia, motor symptoms, and sleep behavior disorder separately), or a hybrid model combining a general sleep factor with the three subscales. Only the hybrid model fit the data well, which changed how clinicians should interpret the scores: the total score alone wasn’t a reliable indicator of sleep quality, but the subscale scores were.

Outside of health, factor analysis underpins personality tests like the Big Five, intelligence assessments, customer satisfaction surveys, and market research. Anytime someone needs to take a complex set of measurements and reduce them to a manageable number of meaningful dimensions, factor analysis is the standard tool.