What Is a Limited Data Set Under HIPAA?

A limited data set (LDS) is a category of health information under HIPAA that falls between fully identifiable patient records and completely de-identified data. It strips out direct identifiers like names and Social Security numbers but keeps certain useful details like dates and zip codes, making it valuable for research, public health, and healthcare operations. Because it still carries some re-identification risk, sharing a limited data set requires a legally binding data use agreement between the parties involved.

What Gets Removed and What Stays

To qualify as a limited data set, protected health information must have all direct identifiers removed for the individual, their relatives, employers, and household members. The identifiers that must be stripped out include names, street addresses (though town, city, state, and zip code can remain), phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate and license numbers, vehicle identifiers and serial numbers, device identifiers and serial numbers, web URLs, IP addresses, biometric identifiers like fingerprints, and full-face photographs or comparable images.

What makes an LDS distinct is what it keeps. Dates of birth, admission, discharge, and death can all stay in the data. So can city, state, and zip code. These details are enormously useful for researchers studying disease patterns across time and geography, tracking seasonal trends, or following patients through multi-year treatment courses. Fully de-identified data under HIPAA’s Safe Harbor method requires removing all these dates and reducing geographic information to the first three digits of a zip code (or removing it entirely in low-population areas), which can make certain types of analysis impossible.

How It Differs From De-Identified Data

The critical legal distinction: a limited data set is still considered protected health information under HIPAA. De-identified data is not. Once health information is properly de-identified using either the Safe Harbor method or the Expert Determination method, it falls outside the Privacy Rule entirely. A covered entity can share de-identified data with anyone, for any purpose, with no restrictions.

A limited data set, by contrast, can only be disclosed for three specific purposes: research, public health activities, and healthcare operations. Every disclosure requires a signed data use agreement. The recipient faces binding legal obligations about how they handle the information.

Both approaches carry some residual risk of re-identification. Even properly de-identified data retains a small, nonzero chance that someone could link it back to a specific patient. The difference is that de-identified data has no ongoing regulatory oversight unless the covered entity voluntarily imposes it, while a limited data set remains governed by HIPAA’s protections throughout its use.

The Data Use Agreement Requirement

No covered entity (a hospital, insurer, or clinic, for example) can hand over a limited data set without a data use agreement in place. This isn’t optional or a best practice. It’s a regulatory requirement under the Privacy Rule. The agreement must spell out who is allowed to use and receive the data, and exactly what uses and disclosures are permitted.

The recipient must agree to several specific conditions:

  • Restricted use: The information can only be used as the agreement permits, or as otherwise required by law.
  • Safeguards: The recipient must use appropriate protections to prevent unauthorized use or disclosure.
  • Breach reporting: Any use or disclosure that violates the agreement must be reported back to the covered entity.
  • Downstream accountability: If the recipient shares the data with agents or subcontractors, those parties must agree to the same restrictions.
  • No re-identification: The recipient cannot attempt to identify individuals in the data or contact them.

That last point is particularly important. The entire framework depends on recipients not trying to reverse the de-identification process. The data use agreement makes that prohibition explicit and legally enforceable.

Why Researchers Use Limited Data Sets

Researchers often choose a limited data set over fully de-identified data because the retained information, especially dates and geographic details, is essential for many study designs. Epidemiologists tracking disease outbreaks need to know when patients were admitted and where they live at the city or zip code level. Longitudinal studies following patients over years of treatment need exact dates to calculate intervals between visits, measure time to relapse, or assess survival curves. Stripping this information down to just the year, as Safe Harbor requires, can make the data useless for these purposes.

The tradeoff is administrative. Using a limited data set means negotiating a data use agreement, accepting ongoing compliance obligations, and maintaining safeguards that wouldn’t apply to de-identified data. For researchers who need temporal or geographic precision, that overhead is worth it. For studies where approximate dates and broad regions are sufficient, fully de-identified data is simpler to obtain and use.

Who Creates and Shares Limited Data Sets

Covered entities, meaning healthcare providers, health plans, and healthcare clearinghouses, are the organizations that create and disclose limited data sets. They’re responsible for properly removing all required identifiers before sharing the data. They’re also responsible for ensuring a valid data use agreement is in place with every recipient.

The recipients are typically university researchers, public health departments, or other organizations within the healthcare system conducting quality improvement or operational analysis. A hospital system might share a limited data set with a research team studying readmission patterns, or a public health agency might receive one to monitor regional disease trends. In each case, the data retains enough detail to be analytically useful while removing the identifiers most likely to expose a patient’s identity.