Turning qualitative data into quantitative data involves assigning numerical values to text, observations, or other non-numerical information so you can count, compare, and statistically analyze it. Researchers call this process “quantitizing,” and it ranges from simple frequency counts of themes to more complex intensity scoring on ordinal scales. The approach you choose depends on what kind of qualitative data you have and what you want to do with the numbers once you have them.
Start With Coding: The Foundation Step
Nearly every method for quantitizing qualitative data begins with coding. You read through your text (interview transcripts, survey responses, field notes, social media posts) and attach labels to segments that represent a concept, theme, or category. These codes can come from a predefined framework you built before looking at the data, or they can emerge organically as you read. Most projects use a combination of both.
Once your data is coded, you already have something countable. The simplest quantification is a frequency count: how many times does a particular theme appear across your dataset, and how many participants expressed it? This alone can be surprisingly useful. If 34 out of 50 interview participants mention cost as a barrier to treatment, that number communicates something a narrative summary cannot.
The key to reliable coding is consistency. If you’re the only person coding, work through a portion of your data, take a break, then recode the same portion to check whether you assign the same labels. If you’re working with a team, you need formal agreement checks. Cohen’s kappa is the standard measure for two coders, and Fleiss’ kappa adapts the same logic for three or more. A kappa of 0.60 or above is generally considered the minimum for trustworthy results. Below that threshold, roughly half the coded data may be unreliable. Scores above 0.80 indicate strong agreement, and anything above 0.90 is nearly perfect. Build a detailed coding manual with definitions and examples for each code before your team starts, and refine it through rounds of independent coding followed by discussion until disagreements are resolved.
Binary Coding: Yes or No
The most straightforward conversion is binary coding, where you create a variable that equals 1 if a theme is present and 0 if it’s absent. For each participant or data source, you simply mark whether a given concept showed up. Did the respondent mention anxiety about the procedure? 1 or 0. Did the customer review reference shipping speed? 1 or 0.
Binary variables (sometimes called dummy variables) slot directly into statistical models like regression. If you have a category with more than two options, say three types of coping strategies, you create a separate binary variable for each type but leave one out as the reference group. Including all categories simultaneously creates what statisticians call a “dummy variable trap,” where the variables are perfectly correlated and the model breaks down. So if you have three categories, you create two binary variables, and the third is implied when both are zero.
Intensity and Strength Scoring
Sometimes presence or absence isn’t enough. You want to capture how strongly a theme shows up, not just whether it does. Intensity scoring assigns values on an ordinal scale, typically 1 to 5, where 1 represents low intensity and 5 represents high intensity. A participant who mentions stress in passing gets a 2; a participant who describes stress as overwhelming and life-altering gets a 5.
This approach works well for qualitative data that contains what researchers call “vague quantifiers,” phrases like “many women reported,” “a strong theme,” “the majority of participants,” or “a recurrent finding.” These phrases imply magnitude without giving exact numbers. You can systematically convert them into ordinal scores by building a rubric. For instance, “a few” might map to a 2, “many” to a 4, and “almost all” to a 5. There are two distinct types of vague quantifiers to watch for: sample quantifiers that reference how many people expressed something (“most participants,” “a small number”), and relationship quantifiers that describe how strongly two concepts are connected (“strongly associated with,” “loosely related to”). Each type needs its own scoring rubric because they measure different things.
The reliability checks matter even more here than with binary coding, because intensity judgments are inherently more subjective. Have multiple coders score the same passages independently, calculate kappa, and don’t proceed until agreement is solid.
Sentiment Analysis for Text Data
If your qualitative data is text and you care about positive versus negative tone, sentiment analysis automates much of the quantification. The basic logic is simple: count the positive words and negative words in a passage, then calculate a score from the ratio.
One common formula takes the difference between positive and negative word counts, then divides by the total word count. This produces a score that adjusts for document length. Another approach divides the positive word count by the negative word count plus one (the “plus one” prevents dividing by zero when there are no negative words). Both methods give you a single number per text passage that you can use in further analysis.
More sophisticated tools like VADER (a widely used sentiment engine) produce a compound score normalized between -1 and +1, where -1 is maximally negative, +1 is maximally positive, and 0 is neutral. VADER was specifically designed for social media text and handles things like capitalization, exclamation points, and slang better than simple word-counting approaches. For formal text like interview transcripts, simpler methods often work just as well.
Building a Coding Manual
The coding manual is the document that makes your conversion reproducible. It defines every code, provides inclusion and exclusion criteria, gives example text passages for each code, and specifies the numerical value assigned. Without one, two people reading the same transcript will produce different numbers, and your quantified data won’t mean much.
Developing the manual is iterative. Start by reading a subset of your data and drafting initial codes. Have your team independently code a small batch using those definitions. Compare results, discuss disagreements, and revise the definitions until they’re specific enough to produce consistent coding. This cycle of code-compare-revise typically takes two to four rounds before the manual stabilizes. Only then should you code the full dataset.
Software That Helps
Qualitative data analysis software can speed up coding and make the transition to numbers smoother. NVivo and ATLAS.ti are the most established tools, offering features for tagging text, organizing codes into hierarchies, and exporting coded data in formats compatible with statistical software. Dedoose is a lighter-weight alternative that covers core coding features and includes built-in mixed methods visualization. Newer AI-powered tools like Skimle can automate parts of the coding process, extracting themes from large datasets and organizing them into hierarchical categories. These tools export in standardized formats that other analysis software can read.
A word of caution on AI-assisted coding: recent evaluations of large language models like GPT-4 found that while AI-generated themes generally didn’t contradict human-derived themes, the AI sometimes elevated minor observations to the level of a theme when human coders wouldn’t. Performance was low and inconsistent when it came to selecting quotes that genuinely supported the identified themes. Hallucinations, where the AI subtly changed wording or combined text fragments in ways that altered meaning, were also observed. AI can be a useful starting point for exploring a large dataset, but it’s not yet a replacement for human judgment in qualitative coding.
What You Can Do With the Numbers
Once your qualitative data is quantified, the type of analysis you can run depends on how you scored it. Binary coded data works with chi-square tests (comparing proportions across groups) and logistic regression (predicting whether a theme is present based on other variables). Frequency counts can be compared across categories or correlated with other measures. Ordinal intensity scores support non-parametric tests that respect rank ordering without assuming equal intervals between scores.
The numbers you produce through quantitizing are not the same as numbers collected through surveys or experiments. They carry the interpretive decisions you made during coding. This isn’t a weakness, but it means transparency matters. Anyone reviewing your work should be able to see your coding manual, your reliability scores, and the decision rules you followed. The conversion from words to numbers always involves judgment calls. Documenting those calls is what separates rigorous quantitizing from arbitrary number assignment.

