What Is Paradata and Why Do Researchers Use It?

Paradata is data generated as a by-product of the data collection process. When someone fills out a survey, takes a census, or completes a questionnaire online, paradata captures not what they answered, but how they answered: how long they spent on each page, where they clicked, what device they used, and how many attempts it took to reach them. The term was coined by survey methodologist Mick Couper in 1998 to describe the audit trails automatically generated by computer-assisted interviewing systems, and it has since expanded well beyond surveys into fields like digital heritage, public health surveillance, and clinical research.

How Paradata Differs From Regular Data and Metadata

Think of three layers. The data itself is the content: a respondent’s answers, a 3D scan of a historical artifact, a patient’s health record. Metadata describes the context of that content: who created it, when, in what format, for what purpose. Paradata captures the process: the decisions, steps, and behaviors that occurred while the data was being created or collected.

In a digital heritage project, for example, metadata might record the file format, creator name, and date of a 3D model. Paradata would document the methodology used to build that model, which reference images the creator relied on, what choices were made when parts of the object weren’t visible, and how long the modeling took. In a survey context, metadata might note when a questionnaire was published and how many questions it contains. Paradata would record how each respondent navigated through those questions in real time.

Common Types of Paradata

The specific paradata generated depends entirely on the system doing the collecting. The U.S. Census Bureau notes that paradata ranges from contact attempt histories for interviewer-assisted operations, to tracking numbers for mail surveys, to keystroke and mouse-click logs for online surveys. Here are the most common types:

  • Timestamps: The start time, end time, and duration of a survey or individual sections within it. These reveal how long a respondent spent on each page and the travel time between successive interviews for field staff.
  • Keystroke and click logs: Every key press and mouse click recorded while a respondent interacts with a web page, including the exact elapsed time between actions.
  • Mouse movements: The path a respondent’s cursor takes across a page, which can indicate hesitation, confusion, or re-reading.
  • Device and browser information: Every time a browser connects to a survey website, it transmits a “user agent string” that identifies the device, operating system, browser type, screen resolution, and whether features like JavaScript or cookies are enabled.
  • GPS and location data: Mobile survey apps can record where an interview took place, useful for verifying that field staff actually visited assigned locations.
  • Audio and video recordings: Some mobile-based surveys use random recordings of interviewer-respondent interactions to evaluate interview quality and identify reasons for non-response.
  • Contact attempt histories: Logs of every call, visit, or email sent to a potential respondent, including the outcome of each attempt.

These types fall into two broad categories for web surveys. Server-side paradata is captured whenever a user submits a page, recording what was entered and when. Client-side paradata uses scripting languages like JavaScript to track behavior while the user is still on the page, capturing finer-grained actions like individual keystrokes and cursor movements.

Why Researchers Collect Paradata

Paradata serves three main purposes: improving the efficiency of data collection, predicting who will and won’t participate, and assessing whether the people who didn’t respond would have given meaningfully different answers from those who did.

One of the most common uses is analyzing call history data to understand survey participation. Longer field periods and more contact attempts yield higher response rates, but the effectiveness of repeated similar attempts diminishes over time. Paradata lets researchers see exactly when that diminishing return kicks in and adjust their strategy accordingly. If the data shows that most successful contacts happen on weekday evenings, resources can be shifted to those windows rather than spread evenly across the week.

Paradata is also valuable for detecting nonresponse bias, the risk that people who declined to participate differ in important ways from those who completed the survey. Because paradata like contact logs and observation notes exist for both respondents and nonrespondents, researchers can compare the two groups on process-level variables. The number of call attempts needed to complete an interview, for instance, is often treated as a proxy for reluctance. By tracking how survey estimates shift as harder-to-reach respondents are gradually included, researchers can gauge whether nonresponse is skewing their results. If the estimates stabilize as more reluctant respondents come in, the bias is likely small. If the numbers keep shifting, there may be a systematic gap between responders and non-responders.

Contact observations can even reveal topic-specific bias. Refusals citing health-related reasons in a health study, or lack of interest in politics in an election study, directly signal that nonresponse is correlated with the survey’s subject matter.

Detecting Interviewer Effects

When surveys are conducted by interviewers, the interviewer themselves can subtly influence the results. They might read questions differently, probe in inconsistent ways, or inadvertently signal expected answers. Paradata offers a way to spot these effects without relying on expensive observation or re-interviewing.

Research using the Panel Study of Income Dynamics found that keystroke and timestamp paradata explained more than half the magnitude of interviewer effects on average across survey items. Paradata actually outperformed demographic and work-related information about the interviewers in predicting these effects. That means survey organizations can use paradata as an active quality control tool, flagging interviewers whose process patterns deviate from the norm and targeting retraining where it’s needed.

Paradata in Public Health and Field Work

Mobile data collection has made paradata especially useful in public health surveillance. When field teams collect disease data using tablets or smartphones, the built-in sensors generate paradata automatically. Timestamps reveal whether interviewers are spending a realistic amount of time on each survey or rushing through. GPS coordinates confirm that staff visited the communities they were assigned to. Audio recordings can help supervisors evaluate whether interviews were conducted properly.

Digital dashboards that display this paradata in real time allow program managers to catch problems during data collection rather than discovering them months later during analysis. If a surveillance team member is consistently completing interviews in half the expected time, or if their GPS data shows they never left the office, the issue can be addressed immediately. The outcomes of paradata analysis can lead to retraining field staff or replanning surveillance activities entirely.

Privacy Considerations

Paradata collection raises real privacy questions, particularly because many respondents don’t realize it’s happening. When you take an online survey, your browser automatically transmits information about your device, operating system, and screen resolution before you answer a single question. Client-side scripts can then record every mouse movement and keystroke as you work through the survey.

Research on informed consent has tested what happens when respondents are told about paradata collection. In one study, participants were shown different consent descriptions ranging from no mention of paradata to explicit explanations like: “In addition to your responses to the survey, we collect other data including keystrokes, time stamps, and characteristics of your browser.” The findings highlight a tension in the field: explicit consent about paradata collection, particularly about audio recording and location tracking, can reduce survey participation. But collecting behavioral data without informing people raises ethical concerns about transparency.

This tension is especially pronounced with client-side paradata, which captures continuous behavior rather than discrete submissions. Server-side paradata, recorded only when a user submits a page, feels more analogous to standard web server logs. But tracking every mouse movement on a page crosses into behavioral monitoring territory that most survey respondents wouldn’t expect.