help button home button JAMIA Bigger figures
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Ohno-Machado, L.
Right arrow Articles by Dreiseitl, S.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Ohno-Machado, L.
Right arrow Articles by Dreiseitl, S.
J Am Med Inform Assoc. 2002;9:S115-S119. DOI 10.1197/jamia.M1241.
© 2002 American Medical Informatics Association


Article

Effects of Data Anonymization by Cell Suppression on Descriptive Statistics and Predictive Modeling Performance

Lucila Ohno-Machado, MD, PhD, Staal Vinterbo, PhD and Stephan Dreiseitl, PhD

Affiliations of the authors: Decision Systems Group, Brigham and Women’s Hospital, Harvard Medical School, Division of Health Sciences and Technology, Massachusetts Institute of Technology, Boston, Massachusetts (LO-M, SV); and Department of Software Engineering for Medicine, Polytechnic University of Upper Austria, Hagenberg, Austria (SD). e-mail: <machado{at}dsg.harvard.edu>.

Abstract

Protecting individual data in disclosed databases is essential. Data anonymization strategies can produce table ambiguation by suppression of selected cells. Using table ambiguation, different degrees of anonymization can be achieved, depending on the number of individuals that a particular case must become indistinguishable from. This number defines the level of anonymization. Anonymization by cell suppression does not necessarily prevent inferences from being made from the disclosed data. Preventing inferences may be important to preserve confidentiality. We show that anonymized data sets can preserve descriptive characteristics of the data, but might also be used for making inferences on particular individuals, which is a feature that may not be desirable. The degradation of predictive performance is directly proportional to the degree of anonymity. As an example, we report the effect of anonymization on the predictive performance of a model constructed to estimate the probability of disease given clinical findings.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2002 by the American Medical Informatics Association.