help button home button JAMIA Bigger figures
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published October 18, 2007 as JAMIA PrePrint; doi:10.1197/jamia.M2408
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow All Versions of this Article:
M2408v1
15/1/14    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Google Scholar
Right arrow Articles by Uzuner, O.
Right arrow Articles by Kohane, I.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Uzuner, O.
Right arrow Articles by Kohane, I.
J Am Med Inform Assoc. 2008;15:14-24. DOI 10.1197/jamia.M2408.
© 2008 American Medical Informatics Association


Viewpoint Paper

Identifying Patient Smoking Status from Medical Discharge Records

Özlem Uzuner, PhDa,b,*, Ira Goldstein, MBAa, Yuan Luo, MSa and Isaac Kohane, MD, PhDc

a University at Albany, State University of New York, Albany, NY
b Massachusetts Institute of Technology, Boston, MA
c Children’s Hospital and Harvard Medical School, Boston, MA.

* Correspondence: Özlem Uzuner, PhD, University at Albany, SUNY, Draper 114A, 135 Western Avenue, Albany, NY 12222 (Email: ouzuner{at}albany.edu).

Received for publication: 02/21/07; accepted for publication: 06/30/07.

The authors organized a Natural Language Processing (NLP) challenge on automatically determining the smoking status of patients from information found in their discharge records. This challenge was issued as a part of the i2b2 (Informatics for Integrating Biology to the Bedside) project, to survey, facilitate, and examine studies in medical language understanding for clinical narratives. This article describes the smoking challenge, details the data and the annotation process, explains the evaluation metrics, discusses the characteristics of the systems developed for the challenge, presents an analysis of the results of received system runs, draws conclusions about the state of the art, and identifies directions for future research. A total of 11 teams participated in the smoking challenge. Each team submitted up to three system runs, providing a total of 23 submissions. The submitted system runs were evaluated with microaveraged and macroaveraged precision, recall, and F-measure. The systems submitted to the smoking challenge represented a variety of machine learning and rule-based algorithms. Despite the differences in their approaches to smoking status identification, many of these systems provided good results. There were 12 system runs with microaveraged F-measures above 0.84. Analysis of the results highlighted the fact that discharge summaries express smoking status using a limited number of textual features (e.g., "smok", "tobac", "cigar", Social History, etc.). Many of the effective smoking status identifiers benefit from these features.




This article has been cited by other articles:


Home page
J. Am. Med. Inform. Assoc.Home page
L. W. D'Avolio and A. A.T. Bui
The Clinical Outcomes Assessment Toolkit: A Framework to Support Automated Clinical Records-based Outcomes Assessment and Performance Measurement Research
J. Am. Med. Inform. Assoc., May 1, 2008; 15(3): 333 - 340.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
R. Wicentowski and M. R. Sydes
Using Implicit Information to Identify Smoking Status in Smoke-blind Medical Discharge Summaries
J. Am. Med. Inform. Assoc., January 1, 2008; 15(1): 29 - 31.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2008 by the American Medical Informatics Association.