| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Viewpoint Paper |
a Department of Biomedical Informatics, Columbia University, New York, NY
b Institute for Urban Family Health, New York, NY
* Correspondence: George Hripcsak, MD, MS, 622 W 168 Street, VC5, New York, NY 10032 (Email: hripcsak{at}columbia.edu).
Received for publication: 07/14/06; accepted for publication: 06/06/08.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
The expanded use of clinical information systems and the advent of health information exchange networks also make possible broader and more flexible sharing of clinical data with public health departments.5,6 The possibility of expanding public health surveillance beyond notifiable conditions to include routine reporting of symptoms, diagnoses, procedures, laboratory data, ancillary reports, etc. may open enormous opportunities. This new-found capability could improve traditional detection and response to disease outbreaks and enable public health authorities to detect outbreaks sooner, expand case-finding, monitor the size, spread, and tempo of outbreaks once detected, quantify morbidity and impact, and monitor the efficacy of interventions. Even greater potential may be realized by expanding beyond infectious disease surveillance to a more active role for public health officials in monitoring priority public health issues such as cancer screening, adult immunizations, and screening and management of diabetes, lipid disorders and HIV.
With these heightened capabilities come special challenges of ensuring patient privacy and confidentiality.7 The Health Insurance Portability and Accountability Act of 1996 (HIPAA) led to the generation of national health information privacy standards in the form of federal regulations,8 which are intended to address such situations. While such privacy standards are critical, they must not prevent the sharing of information for the public good,9 and an excellent review of the privacy regulations and their effect on traditional public health surveillance was provided by the Centers for Disease Control and Prevention.10 However, acceptable approaches to data sharing that goes beyond traditional mandated reporting must also be defined. While recognizing that state and local privacy statutes can be more restrictive than the HIPAA privacy regulations (hereinafter referred to simply as "HIPAA"), in this paper, we review the privacy requirements of non-mandated large-scale data sharing between clinical data sources and public health, we discuss the HIPAA implications for those requirements, and we describe several approaches to accomplish expanded reporting in the context of HIPAA.
| Requirements |
|---|
|
|
|---|
Patient Anonymity
Some forms of public health reporting, such as tracking the overall quality of care delivered in a region, may not require that individual patients be identified. If patient identities are not needed by the health department, then it is safest to send the department anonymized data, either by sending individual-level data that cannot be tracked to individuals, or by sending aggregated data. Assuring true patient anonymity is not trivial,11,12 although mechanisms to distribute data that are not linkable to identities are being defined.12,13
Record Matching
Certain forms of surveillance and reporting require matching data that come from different entities. For example, to monitor health care quality, it may be necessary to look across institutions to tell whether proper preventive care was administered and whether some of a patient's medications interact. Diagnoses may come from a health care provider, clinical tests may come from a laboratory, and medications may come from a pharmacy benefit manager; ideally, data about the same patient should be coordinated. Therefore, although the patient need not be identified, it will be helpful to be able to correlate data about the same patient from different entities.
Re-identifying Patients
When a case is identified as being part of a cluster or potential outbreak, it may be necessary to identify the patient to confirm the case, to administer treatment, or to prevent the spread of disease. Therefore, although patient identities may not need to be attached to the data that are sent to the public health department, there may need to be a mechanism to identify the patient when it is appropriate.
Geographic Localization
Patient addresses may be important in surveillance that uses geospatial clustering. Clustering algorithms can perform better as addresses are known with finer granularity.14 Depending on the context, zip code or street addresses may be beneficial. Detailed street addresses can be used to identify patients with publicly available information, however. Recent research has demonstrated a method for anonymizing patients' geographic location while still maintaining the ability to detect spatial clusters;15 this may reduce the need for detailed addresses.
Temporal Localization
Certain dates, such as date of visit, can be critical in public health reporting. In fact, if real-time reporting is supported, then the date of visit may be inferred from the date of the report. For syndromic surveillance, the date of the visit is essential, and for monitoring the quality of care, the relative dates of admission and procedures may become important. Detailed dates present some risk for uncovering patient identities,11,12 although it will be difficult to identify patients without access to care provider registration databases.
Patient Characteristics
Other patient characteristics, such as age, gender, and race, can be important in public health reporting. Whereas birth date can be used to identify a patient, age in years (or months for babies) is generally sufficient for reporting purposes but presents a much smaller risk of identification than birth date. Nevertheless, it has been shown that seemingly general characteristics like age may be combined to identify patients and that additional procedures may be necessary to achieve true anonymization.16
| HIPAA Implications |
|---|
|
|
|---|
The HIPAA provides a number of other exceptions to authorization that are potentially relevant to public health reporting, and they are summarized in Table 1. Disclosures required by law such as mandated disease reporting are permitted without authorization (HIPAA Section 164.512(a)).8 These disclosures may include patient identifiers like names, detailed addresses, and detailed dates.
|
The HIPAA allows clinical information to be disclosed if it has been de-identified, and it defines a safe harbor such that if 18 types of identifiers are removed, then the data is considered de-identified by HIPAA (HIPAA Section 164.514(b)).8 In addition to identifiers like names, the safe harbor forbids dates more detailed than year and addresses more detailed than the first three digits of the zip code (in most areas). This renders de-identified data less useful for many public health surveillance purposes. Alternatively, a data set may be considered to be de-identified if it has been certified in consultation with a statistician.
To address the limitations of de-identified data, HIPAA defines a limited data set (HIPAA Section 164.514(e)).8 A limited data set excludes identifiers like name, but it does allow detailed dates and five-digit zip codes. The entities involved in the disclosure must enter into a data use agreement that specifies who will receive the data and assures that data will not be further disclosed and that the recipient will not attempt to re-identify the data. The disclosure must meet the minimum necessary standard, but the limited data set definition would appear to be a good match for the minimum necessary to carry out most non-specifically mandated public health surveillance (dates and five-digit zip codes but no direct patient identifiers).
Finally, if a clinical research project includes transfer of clinical data to public health then disclosures can be made without authorization if an Institutional Review Board grants a waiver of HIPAA authorization (HIPAA Section 164.512(i)).8 This is relevant only for a bona fide clinical study, however.
Mechanisms for Re-identifying and Matching Patients
The HIPAA includes provisions for re-identifying patients for those mechanisms set forth in Table 1 that do not include direct patient identifiers. A de-identified data set cannot contain direct patient identifiers but it may include a code maintained by the disclosing entity that can be used to re-identify a patient as long as the code is not derived from patient identifiers, it is not used for other purposes, and the code-patient mapping is not disclosed by the entity.20 Thus, the provider's software could generate and maintain a randomly generated code unique to each patient. If a patient needed to be re-identified, for example, a public health authority could supply the provider with the code, and the provider could notify the patient or report the patient to the health authority with full identifiers as a mandatory case report.
The re-identification process can be automated. For example, if a surveillance alert is generated, then the re-identification code can be sent to the source facility electronically and adjudicated by the facility's information system, potentially generating an alert to the patient's provider or to the patient.
While the regulation states that the limited data set recipient (the public health department in this scenario) must not identify the information or contact the individual (HIPAA Section 164.514(e)),8 it also makes it clear that re-identification is allowed by the covered entity (the data source) using a unique code (HIPAA Section 164.514(c)).8 In this context, we interpret this to mean that if the health department chooses to use a limited data set mechanism, then it may not attempt to identify patients in the limited data set, but it may supply a re-identification code to the data source, which can re-identify the patient and take appropriate action. For example, if the health department detects a case of a reportable disease in a limited data set, it may inform the source provider of the case using the re-identification code, the provider may identify the patient, and take whatever action is appropriate, including reporting the identified case to the public health department under the regular mandatory case reporting provisions.
If no re-identification code is available, then it may be possible for a provider organization to infer who the patient of interest is. For example, based on the log of transmissions, or if a disease case is detected at a health department via de-identified laboratory data, then the health department could demand that the provider organization review its own laboratory data to uncover the case.
The matching of patient data is similar. Those mechanisms that allow direct patient identifiers support the matching of patients across health care providers, at least within the limits of data accuracy and completeness. For the other mechanisms, HIPAA does not explicitly support the matching of patients across health care providers, but its re-identification provisions can be used. De-identified data could in theory be matched if the re-identification codes were coordinated across institutions. This might be possible by having all the health care providers in an area share a common security broker (via a business associate agreement) that generates unique re-identification codes and maintains them.
A limited data set is slightly more flexible. It may include a code that is derived from patient identifiers as long as there is no direct way to reconstitute the patient identifier directly from the code.20 One such example of a code is a "perfect one-way hash."21 A one-way hash function is an approved mathematical algorithm that produces a character string (a "hash") for any given input string, but which cannot be reversed; that is, the original input cannot be reproduced from the hash. A "perfect" one-way hash function is one in which the generated hash is unique: two different inputs never map to the same hash. Therefore, a perfect hash of some combination of the patient name, gender, date of birth, social security number, etc. would produce an identifier that is unique to each patient but that would not reveal the patient's identity. If the providers use the same hash function, then when the same demographic data are entered at two different providers, then the hash of those data will be identical, and records from the two providers can be matched. In practice, this method is likely to be less reliable than either a direct match on patient identifiers or the use of a common security broker, however, because demographic data are frequently entered with minor deviations and any deviation will result in a complete mismatch of the hashes. It is possible that even a modest match rate may be adequate for surveillance, which relies on aggregate results.
HIPAA Accounting of Disclosures
The HIPAA generally requires an accounting of disclosures of protected health information, which means that health care providers must keep track of disclosures and report them to patients when requested. Disclosures required by law, disclosures to public health authorities, and disclosures for research do require accounting, whereas disclosures of de-identified information and of limited data sets do not. Expanded public health surveillance may require institutions to keep track of every disclosure (e.g., every real-time data transfer to the health department for each patient).
The HIPAA provides for summary accounting of multiple disclosures (HIPAA Section 164.528(b)(3))8 that is intended to simplify accounting, although there is some controversy about its interpretation.22 It states that when multiple disclosures are made to the same entity for the same purpose, then one need only report details of the first disclosure during the accounting period of interest; the frequency, periodicity, or number of disclosures during the period; and the date of the last disclosure during the period.
The Centers for Disease Control guidance on HIPAA10 states that the multiple disclosures can span multiple patients. The best form of accounting remains unclear. For example, an easy form of accounting would be to record detailed information for the first report of a given purpose since the HIPAA Rule came into effect, the periodicity of potential disclosures (for example, reports are potentially sent daily), and the last date of a potential disclosure (for example, the last day of the accounting period). Taking the section more literally, however, the provider would need to know the first actual disclosure during an arbitrary accounting period, the actual number of disclosures during the period, and the date of the last actual disclosure. These data would probably have to be derived from a detailed accounting record, so little would be saved in record keeping. A range of interpretations has been noted.22–26
At the very least, Section 164.528(b)(3) ensures that when disclosures are reported to patients, a detailed transaction log for that patient need not be printed out (even if it is tracked). Instead a brief summary will suffice.
| Approaches to Reporting |
|---|
|
|
|---|
| Discussion |
|---|
|
|
|---|
In cases where expanded public health goals (i.e., non-mandated reporting, rather than traditional public health reporting for such things as outbreaks) can be accomplished without patient identities, safe harbor de-identification (approach 3) and aggregation at the source facilities (approach 6) may be useful. It may be possible to use approach 6 by applying surveillance functions at the level of the virtual medical record or by pushing surveillance to the provider organizations via software distributed by a regional health information organization. Where public health goals require more detailed information, limited data sets may provide a balance between privacy and public good (approach 5).
The most ambitious project for expanded public health reporting in the nation is BioSense.29 The Centers for Disease Control and Prevention (CDC) is receiving clinical data from the Veterans Administration and Department of Defense hospitals and clinics, commercial laboratories, and health care facilities around the nation for the purpose of public health surveillance of bioterrorism, disease outbreaks, and natural disasters. BioSense has been seeking all data related to "non-identifying patient demographics, diagnoses, chief complaints, microbiology orders/results, radiology orders/results, medication orders, laboratory orders/results, and pharmacy data" including dates and 5-digit zip codes.30 The CDC has been seeking all related data within those categories, relying on the CDC's designation as a public health authority and broadly worded legislation that provides for the "the establishment of an integrated system or systems of public health alert communications and surveillance networks between and among—(A) Federal, State, and local public health officials; (B) public and private health-related laboratories, hospitals, and other health care facilities; and (C) any other entities determined appropriate by the Secretary"31 to justify the collection of clinical data. This appears to be consistent with approach 5, and CDC has been entering into a data sharing agreement with each data source. The CDC explicitly justifies its selected data elements as being the minimum necessary needed for BioSense's mission.30
In summary, expanded public health surveillance faces a number of challenges related to patient privacy and confidentiality. The HIPAA provides mechanisms to address some of the challenges, although the exact method will vary with the context. Some issues, such as how disclosures must be accounted for, remain unclear. Different combinations and implementations of the approaches defined here will likely be developed in the future.
| Footnotes |
|---|
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |