What Is the Data of an Experiment?: Types and Uses

The data of an experiment is everything you measure, observe, and record while running that experiment. It includes the numbers from your instruments, the categories you assign to what you observe, and the contextual details that explain how the experiment was performed. Without this collected information, an experiment is just an activity. The data is what turns it into evidence.

What Experimental Data Actually Includes

Most people think of experimental data as rows of numbers in a spreadsheet, but it’s broader than that. Experimental data is any measurement or observation collected during an experiment that can be used to answer a question or test a prediction. A biologist measuring protein concentrations, a physicist recording voltage from a sensor, a psychologist noting how participants respond to a prompt: all of these produce experimental data.

Critically, data also includes the context around those measurements. Knowing that a tumor sample showed a certain gene expression level is only useful if you also know the tumor type, how the sample was prepared, and what conditions it was stored under. A cell culture experiment requires records of culture conditions, how long the cells were treated, and even how densely the cells were growing before treatment. Strip away this context, and the raw numbers lose much of their meaning.

Quantitative vs. Qualitative Data

Experimental data falls into two broad categories. Quantitative data is anything expressed as a number. Qualitative data is anything expressed as a category, label, or description.

Quantitative data can be discrete or continuous. Discrete data involves whole numbers you can count: the number of surgeries a patient has had, the number of children in a family. Continuous data can take any value along a range: a person’s height, weight, range of motion, or bone density. In a clinical study, for example, researchers might record that patients averaged 54.5 years of age with an average trunk flexion of 102 degrees. These are continuous measurements.

Qualitative data sorts observations into groups rather than numbers. Recording whether a patient is male or female, whether they smoke, or what caused their injury (vehicle accident, fall, gunshot wound) all produce qualitative data. Some qualitative data has a natural order to it. A pain scale from 1 to 5, or a disability classification that ranges from “normal function” to “no function,” ranks observations from less to more. Other qualitative data, like marital status or eye color, has no built-in ranking at all.

Many experiments collect both types simultaneously. A single study might record a patient’s weight (quantitative) alongside their smoking status (qualitative) and their score on a functional outcome scale that blends both.

Raw Data vs. Processed Data

The first numbers that come off an instrument or out of an observation are called raw data. Raw data is unfiltered, unscaled, and often not immediately useful. A pressure sensor might output readings in millimeters of water. A microphone produces a voltage signal for each data point. A rotating machine gives you signal counts per second.

Processed data is what you get after transforming that raw information into something meaningful. That pressure reading gets converted into standard units using a known equation. The signal counts get converted into surface speed. The microphone voltage gets filtered to remove noise, then broken into its component frequencies so you can see which sounds dominate. The methods used for this processing are typically documented in detail so someone else could repeat the same steps and arrive at the same result.

Neither form is more “real” than the other. Raw data preserves the original measurement, which matters if you want to reprocess it later using a different method. Processed data is what you actually analyze and present. Good practice is to keep both.

The Role of Metadata

Metadata is the information about your data. It doesn’t describe what you found; it describes how, when, why, and with what tools you found it. Harvard Medical School’s data management guidelines break this into several layers:

Reagent metadata: details about the biological or chemical materials used, such as cell lines, antibodies, or drug compounds.
Technical metadata: information generated automatically by instruments and software, like timestamps, calibration settings, or file formats.
Experimental metadata: the conditions of the experiment itself, including the type of assay, time points, and the step-by-step protocol followed.
Analytical metadata: how the data was analyzed, including software names and versions, quality control steps, and output file types.
Dataset-level metadata: the big picture, covering research objectives, who the investigators were, funding sources, and related publications.

This information is typically stored in lab notebooks, written protocols, README files, and data dictionaries (documents that define every variable in a dataset). Without metadata, a spreadsheet full of numbers is essentially meaningless to anyone who wasn’t in the room when the experiment happened.

How Researchers Know the Data Is Trustworthy

Collecting data is one thing. Trusting it is another. Researchers evaluate data quality through two lenses: internal validity and external validity.

Internal validity asks whether the results actually reflect what’s happening in the experiment, rather than being caused by errors in design or execution. If a study has poor internal validity, its findings may be wrong, and no further interpretation matters. Careful planning, adequate sample sizes, consistent data collection, and rigorous quality control all strengthen internal validity.

External validity asks whether the results apply beyond the specific group studied. A drug trial conducted only on young, healthy men may produce internally valid results that don’t hold up in older women. Using broader inclusion criteria and choosing interventions that mirror real-world conditions helps data generalize to a wider population.

Once data passes these checks, researchers use statistical tools to interpret it. A p-value measures the strength of evidence against the assumption that nothing interesting is happening (the “null hypothesis”). A smaller p-value means stronger evidence that the observed effect is real, though it says nothing about how large or important that effect is. Confidence intervals fill that gap by providing a range of plausible values for the true effect. A narrow confidence interval signals a precise estimate. A wide one signals uncertainty. Together, these tools help researchers move from raw observations to conclusions they can stand behind.

Sharing and Preserving Experimental Data

Modern science increasingly treats data as a product in its own right, not just a byproduct of a study. The FAIR principles, published in Nature in 2016, set the standard: scientific data should be Findable, Accessible, Interoperable, and Reusable. That means assigning data a permanent identifier so others can locate it, storing it in formats that different software can read, and documenting it thoroughly enough that another researcher could pick it up and use it independently.

These aren’t just ideals. Since January 25, 2023, the National Institutes of Health requires every grant applicant to submit a formal data management and sharing plan as part of their funding proposal. The plan must describe how data will be stored, maintained, and made available to other researchers. NIH program staff review and approve the plan before funding is awarded, and following through on it becomes a condition of the grant. For small business research programs, data can be withheld for up to 20 years after the award date, but for most federally funded research, the expectation is open sharing.

The goal behind all of this is reproducibility. An experiment’s data is only as valuable as someone else’s ability to understand it, verify it, and build on it.