help button home button JAMIA Bigger figures
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published October 18, 2007 as JAMIA PrePrint; doi:10.1197/jamia.M2434
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow All Versions of this Article:
M2434v1
15/1/32    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Google Scholar
Right arrow Articles by Cohen, A. M.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cohen, A. M.
J Am Med Inform Assoc. 2008;15:32-35. DOI 10.1197/jamia.M2434.
© 2008 American Medical Informatics Association


Case Report

Five-way Smoking Status Classification Using Text Hot-Spot Identification and Error-correcting Output Codes

Aaron M. Cohen, MD, MS*

Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, Portland, OR.

* Correspondence: Aaron M. Cohen, MD, MS, Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Road, Mail Code: BICC, Portland, OR, 97239-3098 (Email: cohenaa{at}ohsu.edu).

Received for publication: 03/13/07; accepted for publication: 10/03/07.

We participated in the i2b2 smoking status classification challenge task. The purpose of this task was to evaluate the ability of systems to automatically identify patient smoking status from discharge summaries. Our submission included several techniques that we compared and studied, including hot-spot identification, zero-vector filtering, inverse class frequency weighting, error-correcting output codes, and post-processing rules. We evaluated our approaches using the same methods as the i2b2 task organizers, using micro- and macro-averaged F1 as the primary performance metric. Our best performing system achieved a micro-F1 of 0.9000 on the test collection, equivalent to the best performing system submitted to the i2b2 challenge. Hot-spot identification, zero-vector filtering, classifier weighting, and error correcting output coding contributed additively to increased performance, with hot-spot identification having by far the largest positive effect. High performance on automatic identification of patient smoking status from discharge summaries is achievable with the efficient and straightforward machine learning techniques studied here.




This article has been cited by other articles:


Home page
J. Am. Med. Inform. Assoc.Home page
O. Uzuner, I. Goldstein, Y. Luo, and I. Kohane
Identifying Patient Smoking Status from Medical Discharge Records
J. Am. Med. Inform. Assoc., January 1, 2008; 15(1): 14 - 24.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2008 by the American Medical Informatics Association.