What Is Real-World Data (RWD) in Healthcare?

RWD stands for real-world data, a term for health information collected during routine medical care rather than in controlled research settings. The FDA defines it as data relating to patient health status or the delivery of health care routinely collected from a variety of sources. RWD feeds into something called real-world evidence (RWE), which is what you get when researchers analyze that data to draw conclusions about how treatments work, how safe they are, or how much they cost.

Where Real-World Data Comes From

RWD is pulled from places where health information already exists. The most commonly used sources are electronic health records (EHRs) and medical insurance claims, both of which contain patient diagnoses, treatments, prescriptions, and outcomes recorded during everyday clinical care. Disease and product registries, which track specific conditions or devices over time, are another major source.

A growing category is patient-generated data. Wearable devices like smartwatches continuously capture heart rate, physical activity, sleep patterns, and community mobility. This type of data differs from traditional patient-reported outcomes, which rely on standardized surveys about fatigue, pain, or satisfaction. Wearable data is patient-directed and collected passively through commercial tools, which means it doesn’t carry the same administrative burden on patients or providers and doesn’t exclude people with low literacy the way written questionnaires can.

How RWD Differs From Clinical Trial Data

Traditional clinical trials, especially randomized controlled trials (RCTs), test treatments under tightly controlled conditions. Participants are selected using strict criteria, follow a fixed treatment pattern, and are continuously monitored according to a protocol. The study group is intentionally homogeneous. The comparator is usually a placebo or a single alternative treatment.

RWD captures the opposite scenario. Patients are heterogeneous, reflecting a much broader range of ages, health conditions, and backgrounds. Treatment patterns vary because doctors adjust care based on individual needs. Follow-up happens in actual practice rather than on a rigid schedule, and patients are compared against many alternative treatments rather than just one. This practical nature is both its strength and its limitation. RWD better reflects what happens in real clinical settings, but the data wasn’t designed for research purposes, which introduces its own challenges.

The core tradeoff: RCTs excel at proving whether a drug works under ideal conditions (efficacy), while RWD is better suited to understanding how a treatment performs across diverse real patients (effectiveness), along with its safety profile and cost implications.

How RWD Is Used in Drug Development

The FDA has a long history of using RWD to monitor the safety of drugs after they reach the market. Post-approval surveillance, where regulators track side effects and complications in the broader population, has been the most established use case. RWD has also supported effectiveness assessments, though historically on a more limited basis.

In 2018, the FDA published a framework for evaluating whether RWE could help support approval of new uses for already-approved drugs or satisfy post-approval study requirements. Since then, the agency has released guidance on specific applications, including how to design externally controlled trials (where the comparison group is drawn from real-world data rather than enrolled in the study) and how to integrate randomized trials into routine clinical practice.

One particularly valuable application is in rare diseases, where enrolling enough patients for a traditional control group can be ethically or practically impossible. Researchers use RWD from health records or prior trials to construct what’s called a synthetic control arm, essentially a comparison group built from historical patient data. This can increase statistical power and reduce the time and cost of a trial without requiring additional patients to go untreated. Some trials use a hybrid approach, combining a smaller group of traditional control patients with synthetic controls drawn from RWD to preserve some benefits of randomization while still boosting sample size.

In December 2025, the FDA issued updated guidance clarifying how it evaluates RWD quality for regulatory decisions about medical devices, expanding and replacing earlier recommendations from 2017. This signals a continuing shift toward formally integrating real-world data into the approval process.

Data Quality and Interoperability Challenges

The biggest obstacle to using RWD effectively is that the data often sits in disconnected systems that can’t communicate with each other. The U.S. relies on multiple independent healthcare systems running different software versions, and the lack of centralized, enforceable data standards means information gets lost or distorted as it moves between hospitals, labs, physician offices, and public health programs.

Without universal standards, most health data exchange is subject to the constraints of proprietary systems with non-interoperable data formats. This creates several practical problems. Coding systems may not capture important details like the limits of detection for a lab test or how a particular measurement was taken. Deaths that occur outside a hospital may never appear in a patient’s medical record. When researchers try to use this data for analysis, they encounter missing variables, inconsistent definitions, and fragmented patient histories that span multiple systems with no easy way to link them.

These gaps don’t make RWD useless, but they do mean that significant effort goes into cleaning, standardizing, and validating the data before it can generate reliable evidence.

Privacy Requirements for RWD Use

Since RWD originates from clinical care, using it for research means repurposing information beyond its original intent. In the U.S., the Health Insurance Portability and Accountability Act (HIPAA) governs health data from providers and insurers. In the European Union, the General Data Protection Regulation (GDPR) covers secondary data use.

Both frameworks allow research use when patient data is properly de-identified. Under HIPAA, de-identification means removing enough information that the data neither identifies nor provides a reasonable basis to identify an individual. There are two paths to achieve this: the Safe Harbor method, which requires removing 18 specific identifiers (names, dates, geographic details, and similar fields), and Expert Determination, which relies on a qualified statistician certifying that the risk of re-identification is very small.

GDPR takes a similar but distinct approach, requiring either full anonymization (irreversibly altering data so individuals can’t be identified by any means reasonably likely to be used) or pseudonymization (replacing identifiers with codes). GDPR doesn’t prescribe specific techniques the way HIPAA’s Safe Harbor method does, instead relying on expert judgment and documentation. Once data is truly de-identified or anonymized under either framework, federal privacy protections no longer apply to it, though stricter state or member-state laws may still kick in.

Both the FDA and professional organizations like ISPOR have emphasized that privacy protection is a core component of data governance for RWD. The FDA’s fit-for-purpose assessment requires documentation of any data transformations made for privacy protection, particularly when linking datasets from different sources.