What Does Results Mean in Science, Explained

In science, “results” are the factual findings of a study or experiment, presented as data that answers the original research question. Results are distinct from opinions or interpretations. They are the raw, objective answer to “what did you find?” before anyone tries to explain why it happened or what it means for the bigger picture.

That definition applies whether you’re talking about the results section of a published paper, the results stage of the scientific method, or the way scientists use the word casually. The concept is the same: results are what the data actually showed.

Results vs. Interpretation

The most important thing to understand about results in science is that they are separate from the meaning people attach to them. A research paper has a results section and a discussion section for exactly this reason. The results section tells the reader what was found. The discussion section tells the reader what those findings mean.

Say a researcher tests whether a new fertilizer helps tomato plants grow taller. The result might be: “Plants given the fertilizer grew an average of 12 centimeters taller over six weeks than plants without it.” That’s a factual observation. The discussion is where the researcher would explore why, compare it to other fertilizers, or suggest what farmers should do with that information. Results are presented in an unbiased, objective tone without any attempt to analyze or explain them. This separation exists to let other scientists evaluate the raw findings on their own terms, without being steered toward a particular conclusion.

How Results Are Reported

Scientific results can take several forms depending on the type of research. Quantitative results are numerical: measurements, counts, percentages, averages. Qualitative results are descriptive, drawn from interviews, observations, or case studies, and presented as narratives rather than numbers. Most research people encounter in health, biology, and physical sciences is quantitative, but qualitative research plays a critical role in fields like psychology and public health. For example, quantifying that a community has a low vaccination rate is one kind of result. Interviewing parents to learn why they aren’t vaccinating their children produces a different, equally valuable kind of result.

In published papers, results typically appear as a combination of text, tables, and figures. Tables present exact numbers, often with measures of uncertainty like confidence intervals or standard errors. Graphs and charts make patterns easier to spot at a glance. Bar charts, line graphs, scatter plots, and histograms are common. A well-constructed table or figure should be understandable on its own, without needing to read the rest of the paper. Major medical journal guidelines require that researchers report both the actual numbers and any percentages derived from them, so readers can verify the math themselves.

From Raw Data to Final Results

Results in a published study are rarely the first numbers that came out of an instrument. There’s a careful process between collecting raw data and presenting polished results, and understanding that process helps you understand what results actually represent.

A chemist running samples through an analytical machine, for instance, doesn’t just report whatever number the machine spits out. First, a standard curve is created by running samples of known concentration through the machine to calibrate it. Then experimental samples are run, often in triplicate (three times each) to check consistency. If the machine’s sensitivity drifts over time, which it often does, correction factors are applied. The final reported result might be an average of three runs, adjusted for instrument drift, displayed as a bar chart with error bars showing how much variation existed between the individual measurements.

None of this is manipulation. It’s standard scientific practice to clean and calibrate data before reporting it. The key is transparency: a good study describes exactly what was done to the data so other scientists can judge whether those steps were appropriate.

What Makes a Result “Significant”

When scientists call a result “statistically significant,” they mean the finding is unlikely to have occurred by pure chance. The standard threshold is a p-value below 0.05, which means there’s less than a 5% probability the result would appear if nothing real were going on. A p-value below 0.01 is considered stronger evidence, and below 0.001 stronger still.

This threshold isn’t a law of nature. It’s a convention that dates back to the statistician R.A. Fisher in the early 20th century, and it’s somewhat arbitrary. A p-value of 0.05 means that if you ran 20 comparisons where there was truly no effect, you’d expect one of them to appear “significant” by chance alone. That’s why scientists are also encouraged to report confidence intervals and effect sizes rather than relying on p-values alone. A confidence interval tells you the range within which the true value likely falls, while effect size tells you how large the difference actually was, not just whether it was real.

Statistical significance doesn’t automatically mean a result is important or meaningful in practical terms. A drug could produce a statistically significant improvement that amounts to only a fraction of a degree in temperature reduction. The result is real, but it might not matter to a patient.

Null Results and Why They Matter

A “null result” occurs when a study finds no meaningful effect or difference. The fertilizer didn’t help. The drug performed no better than a placebo. These results are just as scientifically valid as positive ones, but they have historically been much harder to publish. This phenomenon is called publication bias, sometimes referred to as “the file drawer problem,” because studies with unexciting findings end up filed away rather than shared.

This creates real problems. When only positive results get published, the scientific literature becomes skewed. Effect sizes look larger than they really are. Meta-analyses that combine multiple studies reach distorted conclusions. Other researchers waste time and funding repeating experiments without knowing someone already tried and found nothing. Publishing null results prevents these issues and challenges incorrect ideas that might otherwise go untested.

When Results Go Wrong

Not all published results are trustworthy, and one of the biggest reasons is a practice called p-hacking. This is when researchers analyze their data in many different ways, testing multiple comparisons or tweaking their methods, until they find something that crosses the p-value threshold of 0.05. That one “significant” finding is then reported as if it were the result the study was designed to test all along.

The fact that an unusually large number of published results cluster just below the 0.05 threshold has raised alarms that many findings in the literature may be products of this kind of data mining rather than genuine discoveries. P-hacking is a major contributor to the reproducibility crisis, the finding that many published scientific results can’t be replicated when other labs try to repeat them.

Several solutions have gained traction. Preregistration requires researchers to publicly record their hypotheses and analysis plans before collecting data, making it much harder to quietly shift the goalposts. Registered Reports go further, with journals agreeing to publish a study based on its design regardless of whether the results are positive or null. Data sharing allows other scientists to re-analyze the original numbers independently.

How Scientists Confirm Results

A single study’s results, no matter how well conducted, are not the final word. Science builds confidence in findings through repetition. The National Academies of Sciences distinguishes between two types of confirmation. Reproducibility means taking the same data and running the same analysis to see if you get the same answer. Replicability means conducting a new, independent study to see if the same pattern holds.

Because of natural variability and the limitations of measurement tools, scientific results are always probabilistic rather than absolute. No single experiment delivers certainty. Instead, a finding earns a higher or lower likelihood of being true depending on how consistently it holds up across repeated testing. This is why headlines about a single study “proving” something are almost always overstating the case. Results gain credibility through accumulation: when multiple independent teams, using different methods and populations, arrive at the same conclusion.