help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published June 25, 2008 as JAMIA PrePrint; doi:10.1197/jamia.M2716
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow All Versions of this Article:
M2716v1
15/5/627    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Google Scholar
Right arrow Articles by El Emam, K.
Right arrow Articles by Dankar, F. K.
PubMed
Right arrow PubMed Citation
Right arrow Articles by El Emam, K.
Right arrow Articles by Dankar, F. K.
J Am Med Inform Assoc. 2008;15:627-637. DOI 10.1197/jamia.M2716.
© 2008 American Medical Informatics Association


Research Paper

Protecting Privacy Using k-Anonymity

Khaled El Emam, PhDa,b,* and Fida Kamal Dankar, MSca

a Children's Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada
b Pediatrics, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada

* Correspondence: Khaled El Emam, Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, Ontario K1J 8L1, Canada (Email: kelemam{at}uottawa.ca).

Received for publication: 01/09/08; accepted for publication: 05/21/08.

Objective: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets.

Design: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets.

Measurement: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric.

Results: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity.

Conclusion: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2008 by the American Medical Informatics Association.