help button home button JAMIA Bigger figures
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published October 24, 2008 as JAMIA PrePrint; doi:10.1197/jamia.M2862
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow All Versions of this Article:
M2862v1
16/1/37    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Morrison, F. P.
Right arrow Articles by Hripcsak, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Morrison, F. P.
Right arrow Articles by Hripcsak, G.
J Am Med Inform Assoc. 2009;16:37-39. DOI 10.1197/jamia.M2862.
© 2009 American Medical Informatics Association


Technical Brief

Repurposing the Clinical Record: Can an Existing Natural Language Processing System De-identify Clinical Notes?

Frances P. Morrison, MD, MPH, MA*, Li Li, MS, Albert M. Lai, PhD and George Hripcsak, MD, MS

Columbia University Department of Biomedical Informatics, New York, NY

* Correspondence: Frances Morrison, 622 West 168th Street, Vanderbilt Clinic, 5th Floor, New York, New York 10032 (Email: frances.morrison{at}dbmi.columbia.edu).

Received for publication: 05/16/08; accepted for publication: 09/30/08.

Electronic clinical documentation can be useful for activities such as public health surveillance, quality improvement, and research, but existing methods of de-identification may not provide sufficient protection of patient data. The general-purpose natural language processor MedLEE retains medical concepts while excluding the remaining text so, in addition to processing text into structured data, it may be able provide a secondary benefit of de-identification. Without modifying the system, the authors tested the ability of MedLEE to remove protected health information (PHI) by comparing 100 outpatient clinical notes with the corresponding XML-tagged output. Of 809 instances of PHI, 26 (3.2%) were detected in output as a result of processing and identification errors. However, PHI in the output was highly transformed, much appearing as normalized terms for medical concepts, potentially making re-identification more difficult. The MedLEE processor may be a good enhancement to other de-identification systems, both removing PHI and providing coded data from clinical text.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2009 by the American Medical Informatics Association.