What Is a Master Patient Index and How It Works

A master patient index (MPI) is a database that assigns each patient a single unique identifier and links all of their health records together, even when those records live in different systems like electronic health records, pharmacy databases, and billing software. It’s the backbone of how hospitals and health systems keep track of who’s who, ensuring that every lab result, prescription, and clinical note ends up attached to the right person.

How an MPI Works

Every time a patient checks in at a facility, their demographic information (name, date of birth, address, phone number, and sometimes a Social Security number) gets fed into the MPI. The system then runs a matching algorithm to determine whether that person already exists in the database. If a match is found, the new visit gets linked to the existing record. If no match exists, the MPI creates a new entry and assigns a unique patient identifier (UPI) that follows the patient across every future interaction.

The MPI itself doesn’t store your full medical history. Think of it more like a cross-reference guide. It connects the dots between separate data sources so that when a provider pulls up your record, they see a complete picture rather than isolated fragments scattered across departments.

MPI vs. Enterprise Master Patient Index

A standard MPI typically serves a single hospital or clinic. An enterprise master patient index (EMPI) does the same job but across multiple institutions. If a health system operates ten hospitals and dozens of outpatient clinics, the EMPI ensures that a patient who visits three of those facilities is recognized as one person with one unified record, not three separate people.

Some states have taken this a step further. Connecticut, for example, operates a statewide EMPI that uniquely identifies individuals across multiple agencies and settings to enable real-time health information exchange. The goal is to ensure each person is represented once across all participating systems, which supports better coordination when patients receive care from different providers or programs.

The Data That Identifies You

MPIs rely on a combination of demographic fields to distinguish one patient from another. The most common identifiers include your full name, date of birth, address, phone number, and gender. Some systems also use Social Security numbers, though privacy regulations increasingly limit this practice.

None of these identifiers are perfectly stable. People change their names after marriage or divorce, move to new addresses, and switch phone numbers. Twins share birth dates and sometimes similar names. Immunization registries, for instance, have to plan specifically for the near-identical records that twins create. These realities make patient matching far more complicated than it might seem on the surface.

Deterministic vs. Probabilistic Matching

MPIs use one of two main approaches to decide whether two records belong to the same person. Deterministic matching follows rigid, predefined rules. A rule might say: if the Social Security number and address match exactly, these are the same patient. It’s straightforward, but it struggles with records that contain typos, outdated information, or missing fields.

Probabilistic matching takes a statistical approach. Instead of requiring exact matches, it calculates the likelihood that two records refer to the same person based on how closely their fields align. A record where the name, birth date, and address all partially match might score above 70%, triggering a positive match. Scores below 63% flag the records as different patients, while anything in between lands in an uncertain zone that requires human review.

A study published in the Journal of Digital Imaging compared the two methods across millions of patient record pairs. When researchers applied a probabilistic scoring system to records that a deterministic system had already matched, they found that roughly 0.85% of those matches were uncertain and another 0.41% were outright mismatches. In a system with nearly 10 million matched pairs, that translates to about 40,000 records linked to the wrong person. Both methods produced low false-positive rates (around 0.05%), but the probabilistic approach handled complex typos and error patterns more effectively.

What Goes Wrong Without Accurate Matching

Duplicate records are the most common problem. A single hospital typically has a duplicate rate between 5% and 10%, meaning that up to one in ten patients may have more than one record in the system. Each duplicate pair costs an organization an estimated $50 in hidden operational expenses, covering the staff time needed to investigate, merge, and correct the files. For a large hospital with hundreds of thousands of records, those costs add up quickly.

The clinical risks are more serious. When a patient has two separate records, critical information can get split between them. A provider looking at one record might not see an allergy documented in the other, or might miss a recent prescription that creates a dangerous drug interaction. Patient identification matching problems are a major contributor to data integrity issues in electronic health records, and they contribute to deaths resulting from medical errors. Duplicate records also lead to redundant testing, since a provider who can’t find a previous result will simply order the test again.

How Modern Systems Exchange Identity Data

For an MPI to work across different software platforms, it needs a common language. The most widely adopted standard for this is FHIR (Fast Healthcare Interoperability Resources), maintained by HL7, the organization that sets healthcare data standards. FHIR defines a specific operation called “$match” that allows any healthcare application to send patient information to an MPI and receive back a ranked list of potential matches, scored from 0 to 1 based on confidence level.

The patient information submitted doesn’t have to be complete. A system might send only a name and date of birth, and the MPI will return whatever matches it can find, ordered from most likely to least likely. If no matches exist, the MPI returns an empty result rather than an error. This flexibility is important because patient data often arrives in fragments, especially during emergencies or when records are transferred between unrelated health systems.

Why MPIs Matter for Health Information Exchange

Health information exchanges (HIEs) allow providers across different organizations to share patient data electronically. An EMPI is what makes this possible at scale. Without one, a cardiologist in one health system and a primary care doctor in another would have no reliable way to confirm they’re looking at the same patient’s records.

Institutions that maintain well-managed MPIs are considered more desirable partners for data sharing projects and patient registries, because they can draw from a broader, more reliable pool of data than any single system alone. The MPI doesn’t replace the need for good data entry practices or identity verification at the front desk. It’s an infrastructure layer, a tool that applies whatever identity management strategies an organization has chosen. But without it, coordinating care across multiple providers, departments, or institutions becomes vastly more error-prone.