help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published October 18, 2007 as JAMIA PrePrint; doi:10.1197/jamia.M2442
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow All Versions of this Article:
M2442v1
15/1/36    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Google Scholar
Right arrow Articles by Clark, C.
Right arrow Articles by Chajewska, U.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Clark, C.
Right arrow Articles by Chajewska, U.
J Am Med Inform Assoc. 2008;15:36-39. DOI 10.1197/jamia.M2442.
© 2008 American Medical Informatics Association


Case Report

Identifying Smokers with a Medical Extraction System

Cheryl Clark, PhDa,*, Kathleen Good, PhDb, Lesley Jeziernyb, Melissa Macpherson, MAb, Brian Wilsonb and Urszula Chajewska, PhDb

a The MITRE Corporation, Bedford, MA
b Dictaphone Healthcare Solutions, Nuance Communications, Inc., Burlington, MA.

* Correspondence: Cheryl Clark, The MITRE Corporation, 202 Burlington Road, Bedford, MA 01730 (Email: cclark{at}mitre.org).

Received for publication: 03/16/07; accepted for publication: 10/03/07.

The Clinical Language Understanding group at Nuance Communications has developed a medical information extraction system that combines a rule-based extraction engine with machine learning algorithms to identify and categorize references to patient smoking in clinical reports. The extraction engine identifies smoking references; documents that contain no smoking references are classified as UNKNOWN. For the remaining documents, the extraction engine uses linguistic analysis to associate features such as status and time to smoking mentions. Machine learning is used to classify the documents based on these features. This approach shows overall accuracy in the 90s on all data sets used. Classification using engine-generated and word-based features outperforms classification using only word-based features for all data sets, although the difference gets smaller as the data set size increases. These techniques could be applied to identify other risk factors, such as drug and alcohol use, or a family history of a disease.




This article has been cited by other articles:


Home page
J. Am. Med. Inform. Assoc.Home page
O. Uzuner, I. Goldstein, Y. Luo, and I. Kohane
Identifying Patient Smoking Status from Medical Discharge Records
J. Am. Med. Inform. Assoc., January 1, 2008; 15(1): 14 - 24.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2008 by the American Medical Informatics Association.