help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH

First published December 11, 2008 as JAMIA PrePrint; doi:10.1197/jamia.M2902
Journal of the American Medical Informatics Association 2009;16(2):256-266
© 2009 American Medical Informatics Association


A more recent version of this article appeared on March 1, 2009
This Article
Right arrow Full Text (PDF) Free
Right arrow All Versions of this Article:
M2902v1
16/2/256    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by El Emam, K.
Right arrow Articles by AbdelMalik, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by El Emam, K.
Right arrow Articles by AbdelMalik, P.

Submitted on June 18, 2008
Accepted on November 30, 2008

Evaluating Predictors of Geographic Area Population Size Cutoffs to Manage Re-identification Risk

Khaled El Emam1*, Ann Brown2, and Philip AbdelMalik3

Affiliation of the authors: 1 Children's Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada; Pediatrics, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; GIS Infrastructure, Office of Public Health Practice, Public Health Agency of Canada, Ottawa, Ontario ; 2 Children's Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada; 3 GIS Infrastructure, Office of Public Health Practice, Public Health Agency of Canada, Ottawa, Ontario

* To whom correspondence should be addressed.

Objective In public health and health services research, the inclusion of geographic information in data sets is critical. Because of concerns over the re-identification of patients, data from small geographic areas are either suppressed or the geographic areas are aggregated into larger ones. Our objective is to estimate the population size cutoff at which a geographic area is sufficiently large so that no data suppression or further aggregation is necessary.

Design The 2001 Canadian census data was used to conduct a simulation to model the relationship between geographic area population size and uniqueness for some common demographic variables. Cutoffs are computed for geographic area population size, and prediction models are developed to estimate the appropriate cutoffs.

Measurements Re-identification risk is measured using uniqueness. Geographic area population size cutoffs are estimated using the maximum number of possible values in the data set and a traditional entropy measure.

Results The model which predicted population cutoffs using the maximum number of possible values in the data set had R2 values around 0.9, and relative error of prediction less than 0.02 across all regions of Canada. The models are then applied to assess the appropriate geographic area size for the prescription records provided by retail and hospital pharmacies to commercial research and analysis firms.

Conclusion To manage re-identification risk, the prediction models can be used by public health professionals, health researchers and research ethics boards to decide when the geographic area population size is sufficiently large.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
Copyright © 1994 by the American Medical Informatics Association.