help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH

First published October 18, 2007 as JAMIA PrePrint; doi:10.1197/jamia.M2437
Journal of the American Medical Informatics Association 2008;15(1):25-28
© 2008 American Medical Informatics Association


A more recent version of this article appeared on January 1, 2008
This Article
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow All Versions of this Article:
M2437v1
15/1/25    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Savova, G. K.
Right arrow Articles by Chute, C. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Savova, G. K.
Right arrow Articles by Chute, C. G.

Submitted on March 15, 2007
Accepted on September 16, 2007

Mayo Clinic NLP System for Patient Smoking Status Identification

Guergana K. Savova PhD1*, Philip V. Ogren MS1, Patrick H. Duffy1, James D. Buntrock MS1, and Christopher G. Chute MD, DrPH1

Affiliation of the authors: 1 Biomedical Informatics, Mayo Clinic, Rochester, MN

* To whom correspondence should be addressed.

This paper describes our system entry for the 2006 I2B2 contest "Challenges in Natural Language Processing for Clinical Data" for the task of identifying the smoking status of patients. Our system makes the simplifying assumption that patient-level smoking status determination can be achieved by accurately classifying individual sentences from a patient's record. We created our system with reusable text analysis components built on the Unstructured Information Management Architecture and WEKA. This reuse of code minimized the development effort related specifically to our smoking status classifier. We report precision, recall, F-score and 95% exact confidence intervals for each metric. Recasting the classification task for the sentence level and reusing code from other text analysis projects allowed us to quickly build a classification system that performs with a system F-score of 92.64 based on held-out data tests and of 85.57 on the formal evaluation data. Our general medical natural language engine is easily adaptable to a real-world medical informatics application. Some of the limitations as applied to the use-case are negation detection and temporal resolution.




This article has been cited by other articles:


Home page
J. Am. Med. Inform. Assoc.Home page
O. Uzuner, I. Goldstein, Y. Luo, and I. Kohane
Identifying Patient Smoking Status from Medical Discharge Records
J. Am. Med. Inform. Assoc., January 1, 2008; 15(1): 14 - 24.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
Copyright © 1994 by the American Medical Informatics Association.