help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH

First published June 23, 2006 as JAMIA PrePrint; doi:10.1197/jamia.M2077
Journal of the American Medical Informatics Association 2006;13(5):516-525
© 2006 American Medical Informatics Association


A more recent version of this article appeared on September 1, 2006
This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
M2077v1
13/5/516    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Pakhomov, S. V. S.
Right arrow Articles by Chute, C. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pakhomov, S. V. S.
Right arrow Articles by Chute, C. G.

Submitted on February 6, 2006
Accepted on May 30, 2006

Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques

Serguei V. S. Pakhomov PhD1*, James D. Buntrock MS1, and Christopher G. Chute MD, DrPH1

Affiliation of the authors: 1 Division of Biomedical Informatics, Department of Health Sciences Research Mayo Clinic, Rochester, MN

* To whom correspondence should be addressed.

Objective Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes.

Methods We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a nave Bayes classifier. The latter two types of codes are subsequently manually reviewed.

Measurements Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed.

Results At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%.

Conclusion Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.




This article has been cited by other articles:


Home page
J. Am. Med. Inform. Assoc.Home page
B. Wellner, M. Huyck, S. Mardis, J. Aberdeen, A. Morgan, L. Peshkin, A. Yeh, J. Hitzeman, and L. Hirschman
Rapidly Retargetable Approaches to De-identification in Medical Records
J. Am. Med. Inform. Assoc., September 1, 2007; 14(5): 564 - 573.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
Copyright © 1994 by the American Medical Informatics Association.