Where Research Begins: From Observation to Data

Research begins with a question, and that question almost always comes from noticing something unexplained. Whether in a lab, a clinic, or a library, the starting point is the same: someone observes a gap between what is known and what needs to be known, then structures that gap into a question that can be tested. Everything that follows, from funding applications to published results, traces back to that initial moment of structured curiosity.

It Starts With Observation

The first formal step in any research process is observation. You notice something: a pattern in patient outcomes, a chemical reaction that behaves unexpectedly, a social trend that existing theories don’t explain. This isn’t passive noticing. It involves gathering and assembling information about an event, phenomenon, or exception to what was previously understood. The observation phase is where you absorb enough detail to realize something is missing or wrong in the current explanation.

From that observation, you define a problem by asking questions that are relevant and testable. A vague sense that “something interesting is happening” isn’t research yet. It becomes research when you can state the question precisely enough to design a way to answer it. That leads to a hypothesis, an educated guess that can be tested and, critically, proven wrong. If a hypothesis can’t be disproven by any possible result, it isn’t a scientific hypothesis. This requirement for falsifiability is what separates research from speculation.

Finding the Gap in What’s Already Known

Before launching a new study, researchers need to confirm that their question hasn’t already been answered. This is where a literature review comes in. You systematically read what’s been published on your topic and look for gaps: places where the evidence is missing, imprecise, biased, inconsistent, or simply not the right kind of information to answer the question at hand.

A structured approach developed by the Agency for Healthcare Research and Quality classifies research gaps using two elements. First, you characterize the gap itself, often using a framework that maps it against five dimensions: the population studied, the intervention or exposure, the comparison group, the outcome measured, and the setting. Second, you identify why the gap exists. Maybe previous studies were too small to produce reliable results. Maybe they measured the wrong outcome, or the findings across studies contradicted each other with no clear resolution. Each type of gap points toward a different kind of study needed to fill it.

This process matters because it prevents duplication and sharpens focus. A researcher who skips the literature review risks spending years answering a question that was resolved a decade ago, or framing their question so broadly that no single study could address it.

Choosing a Framework Before Collecting Data

Once you have a question and know the gap you’re trying to fill, you need a structure for your study. Think of it like creating blueprints before building a house. In research, this blueprint is called a framework, and it comes in two forms.

A theoretical framework starts with an existing theory and tests whether it holds up under new conditions. You’re looking for a reason why a relationship exists between two variables. For example, if a well-established theory predicts that a certain drug should reduce inflammation, a theoretical framework would structure a study around testing that prediction in a new patient population.

A conceptual framework works differently. It starts not with established theory but with concepts drawn from direct observation or intuition. This approach is useful when existing theories don’t apply or are insufficient. Instead of testing a prediction, you’re mapping out relationships between concepts and proposing how they might connect. These two frameworks aren’t contradictory. Some studies use elements of both. The key is that without some kind of structural blueprint, data collection becomes aimless.

Where Clinical Research Originates

In medicine, research often begins at the bedside. A physician notices that certain patients respond unusually to a treatment, or that a disease progresses differently than textbooks describe. That clinical observation becomes the seed of a research question. This pathway, sometimes called “bedside to bench,” has a long history. William Osler built the Johns Hopkins School of Medicine on the principle that clinics and laboratories had to be linked. His successor, Lewellys Barker, formalized the idea that the lab was where you could determine the underlying biology of what you observed in patients and develop new treatments based on that understanding.

This cycle runs in both directions. Basic research, the kind driven purely by curiosity about how things work, generates fundamental knowledge that eventually feeds into practical applications. As one landmark report on science policy put it, basic research creates the scientific capital from which practical applications must be drawn. New products and therapies don’t appear fully formed. They’re built on principles painstakingly developed through research in its purest form. Applied research then takes that knowledge and focuses it on solving specific real-world problems, including new diagnostics and treatments. Neither type of research has a fixed starting point. A lab discovery can prompt a clinical trial, and a clinical observation can send a scientist back to the lab.

Ethical Groundwork Before the Study Begins

For any study involving people, a significant amount of work happens before a single participant is enrolled. The Declaration of Helsinki, most recently revised in 2024, lays out the ethical principles governing research on human subjects, human data, and human tissues. These principles don’t replace the role of institutional review boards (IRBs) or local ethics committees, but they provide the ethical foundation that those bodies apply when reviewing study proposals.

The practical requirements for getting a study approved are extensive. At the NIH, for example, submitting a new study for review requires a full research protocol, informed consent documents, approval from scientific reviewers, conflict-of-interest clearances, all recruitment materials (including social media ads), every survey or questionnaire that participants will complete, and approvals from any relevant safety committees. If the study involves investigational drugs or devices, additional regulatory authorization is required before funding decisions are made. All of this must be in place before research on human subjects can begin. The preparation phase for a clinical study often takes months or even years.

Securing Funding for Early-Stage Work

Most research can’t proceed without money, and getting funded is itself a structured process that begins well before any experiments run. Pilot studies, the small preliminary projects that test whether a larger study is feasible, have their own dedicated funding mechanisms. The NIH’s R34 planning grant, for instance, provides up to $450,000 in direct costs over three years (with a cap of $225,000 in any single year) specifically to support work that informs the design of a future clinical trial.

The requirements reveal how much planning is expected at this stage. Applicants must describe the future clinical trial they’re ultimately aiming for and explain exactly how the pilot data will inform decisions about that trial. The funding isn’t for running the trial itself, writing a protocol, or building infrastructure. It’s for answering the specific scientific questions that need to be resolved before a larger trial can be designed. This means the research question must be sharp enough, and the gap in knowledge clear enough, to justify the investment before any data is collected.

Planning for Data From the Start

One increasingly important part of how research begins is deciding, before data collection starts, how that data will be managed and shared. The FAIR principles, adopted across NIH-funded research, set the standard. Data should be findable, meaning it has unique identifiers and rich descriptions so both humans and computers can locate it. It should be accessible, meaning authorized users can retrieve it through standard methods. It should be interoperable, meaning it uses standardized formats that allow it to be combined with other datasets and analyzed by software, including machine learning tools. And it should be reusable, meaning it’s well-described enough that future researchers can work with it confidently.

Planning for FAIR data at the outset shapes decisions about what software to use, how to label variables, where to store files, and what metadata to record. Researchers who treat data management as an afterthought often find their results are difficult to reproduce or share, limiting the impact of work that may have taken years to complete.