help button home button JAMIA Bigger figures
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Dreiseitl, S.
Right arrow Articles by Ohno-Machado, L.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Dreiseitl, S.
Right arrow Articles by Ohno-Machado, L.
Journal of the American Medical Informatics Association 9:S110-S114 (2002)
© 2002 American Medical Informatics Association


Article

Disambiguation Data: Extracting Information from Anonymized Sources

Stephan Dreiseitl, PhD, Staal Vinterbo, PhD and Lucila Ohno-Machado, MD, MHA, PhD

Affiliations of the authors: Polytechnic University of Upper Austria at Hagenberg, Department of Software Engineering for Medicine, A-4232 Hagenberg, Austria (SD); Decision Systems Group, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115 (SV, LO-M). email: <Stephan.Dreiseitl{at}fhs-hagenberg.ac.at>.

Abstract

Privacy protection is an important consideration when releasing medical databases to the research community. We show that while recent advances in anonymization algorithms provide increased levels of protection, it is still possible to calculate approximations to the original data set. In some cases, one can even uniquely reconstruct entries in a table before anonymization.

In this paper, we demonstrate how knowledge of an anonymization algorithm based on ambiguating data cell entries can be used to undo the anonymization process. We investigate the effect of this algorithm and its reversal on data sets of varying sizes and distributions. It is shown that by using a computationally complex disambiguation process, information on individuals can be extracted from an anonymized data set.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2002 by the American Medical Informatics Association.