Which Definition Best Identifies Duplicate Medical Records?

A duplicate medical record occurs when a single patient is associated with more than one medical record number (MRN) within the same healthcare system. That is the standard definition used across health information management. The key distinction is one person, multiple records, rather than multiple people sharing one record. Understanding this definition matters because duplicate records are a documented risk factor for preventable medical errors, and they affect an estimated 5% to 16% of records in healthcare systems nationwide.

The Core Definition Explained

The definition centers on a simple concept: one patient should have one unique identifier in a healthcare database. When registration errors, spelling variations, or system migrations cause the same person to be assigned two or more MRNs, each of those records becomes a “duplicate.” Neither record is complete on its own. One might contain lab results from an emergency visit while the other holds primary care notes and medication lists. A clinician pulling up only one of those records sees a fragmented picture of the patient’s health.

This fragmentation is what makes duplicates dangerous. The Agency for Healthcare Research and Quality identifies the presence of duplicate records as a risk factor for preventable errors. A provider who doesn’t see a patient’s full medication history could prescribe a drug that interacts with something already being taken. Duplicate lab orders waste time and money. Insurance claims get denied when records don’t match.

Duplicates vs. Overlays

Duplicate records are often confused with overlays, but the two problems are distinct and carry different levels of urgency.

Duplicate: One patient has two or more separate records. The information in each record belongs to the correct person but is split across files.
Overlay: Data from two different patients has been combined into a single record, or completely switched between records. One patient’s chart now contains another person’s information.

While duplicates are a serious concern, overlays are considered more dangerous. A duplicate means incomplete information. An overlay means incorrect information, which could lead a clinician to act on lab results, allergies, or diagnoses that belong to someone else entirely. Healthcare organizations generally prioritize resolving overlays first because of this higher patient safety risk. As one survey of Twin Cities healthcare organizations noted, a duplicate is actually preferable to a comingled record where two patients have been erroneously merged together.

How Duplicates Get Created

Duplicates rarely happen because someone is careless. They accumulate naturally over time through everyday scenarios. A patient registers at an emergency department and gives a slightly different name spelling, perhaps using a nickname or a married name. A front desk clerk transposes two digits of a birth date. A patient visits a new clinic within the same health system and gets assigned a fresh MRN because the system doesn’t recognize them.

The problem compounds when patient information changes over time. Addresses, phone numbers, employers, and insurance carriers all shift. Each change creates another opportunity for the system to fail to connect a new visit to an existing record. Large health systems with multiple facilities are especially vulnerable because different sites may each assign their own MRNs independently.

How Healthcare Systems Detect Duplicates

Healthcare organizations use a database called a Master Patient Index (MPI) to track every patient’s identity. In larger networks that span multiple hospitals and clinics, this becomes an Enterprise Master Patient Index (EMPI), which links various MRNs for one patient under a single umbrella identifier. The EMPI is the primary tool for catching and resolving duplicates.

Two main approaches power the matching logic behind these systems. Deterministic matching uses strict rules to compare records. If two records share the same Social Security number and address, for example, the system flags them as a match. This approach is straightforward but rigid. It can miss true duplicates when data contains typos, and it can incorrectly merge two different people who happen to share a value like a common name and birth date.

Probabilistic matching takes a more flexible approach. It compares multiple demographic fields and assigns a score based on how likely two records are to belong to the same person. The Office of the National Coordinator for Health Information Technology recommends matching on at least name, birth date, phone number, and address. In one well-documented implementation, records scoring above 70% similarity were automatically matched, those below 63% were kept separate, and anything in between was flagged for a human reviewer to examine manually. This middle “uncertain” zone is where the most difficult cases land, often requiring staff to contact the patient directly or review clinical details to confirm identity.

The Financial Cost of Duplicates

Each duplicate record carries an estimated cost of $96 to resolve, covering the staff time needed to identify, review, and merge the records while verifying that no overlay has occurred. That figure may sound modest, but it scales quickly. A mid-sized hospital system with 500,000 records and a 10% duplication rate would be looking at 50,000 duplicates and nearly $5 million in remediation costs. Those numbers don’t account for the downstream financial impact of denied insurance claims, repeated diagnostic tests, and billing errors that duplicates cause before they’re caught.

Why the Definition Matters

Getting the definition right is more than an academic exercise. When healthcare professionals, software vendors, and administrators all use the same definition, they can measure the problem consistently, set benchmarks, and track improvement. A system that conflates duplicates with overlays will miscount both and misallocate resources. Knowing that a duplicate is specifically “one patient with multiple MRNs” keeps the focus on the root cause: failures in patient matching at the point of registration. That clarity drives better training for front-line staff, smarter investment in matching technology, and more accurate reporting of data quality across the organization.