help button home button JAMIA Bigger figures
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hripcsak, G.
Right arrow Articles by Heitjan, D. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hripcsak, G.
Right arrow Articles by Heitjan, D. F.
Journal of the American Medical Informatics Association 6:143-150 (1999)
© 1999 American Medical Informatics Association


Research Paper

A Reliability Study for Evaluating Information Extraction from Radiology Reports

George Hripcsak, MD, Gilad J. Kuperman, Md, PhD, Carol Friedman, PhD and Daniel F. Heitjan, PhD

Columbia University, New York, New York (GH, CF, DFH); Partners Healthcare Systems, Boston, Massachusetts (GJK); Queens College CUNY, New York, New York (CF).

Correspondence and reprints: George Hripcsak, MD, 161 Fort Washington Avenue, DAP-1310, New York, NY 10032; e-mail: <hripcsak{at}columbia.edu>.

Goal: To assess the reliability of a reference standard for an information extraction task.

Setting: Twenty-four physician raters from two sites and two specialities judged whether clinical conditions were present based on reading chest radiograph reports.

Methods: Variance components, generalizability (reliability) coefficients, and the number of expert raters needed to generate a reliable reference standard were estimated.

Results: Per-rater reliability averaged across conditions was 0.80 (95% CI, 0.79-0.81). Reliability for the nine individual conditions varied from 0.67 to 0.97, with central line presence and pneumothorax the most reliable, and pleural effusion (excluding CHF) and pneumonia the least reliable. One to two raters were needed to achieve a reliability of 0.70, and six raters, on average, were required to achieve a reliability of 0.95. This was far more reliable than a previously published per-rater reliability of 0.19 for a more complex task. Differences between sites were attributable to changes to the condition definitions.

Conclusion: In these evaluations, physician raters were able to judge very reliably the presence of clinical conditions based on text reports. Once the reliability of a specific rater is confirmed, it would be possible for that rater to create a reference standard reliable enough to assess aggregate measures on a system. Six raters would be needed to create a reference standard sufficient to assess a system on a case-by-case basis. These results should help evaluators design future information extraction studies for natural language processors and other knowledge-based systems.




This article has been cited by other articles:


Home page
J. Am. Med. Inform. Assoc.Home page
W. W. Chapman, J. N. Dowling, and M. M. Wagner
Generating a Reliable Reference Standard Set for Syndromic Case Classification
J. Am. Med. Inform. Assoc., November 1, 2005; 12(6): 618 - 629.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
P. R.O. Payne and J. B. Starren
Quantifying Visual Similarity in Clinical Iconic Graphics
J. Am. Med. Inform. Assoc., May 1, 2005; 12(3): 338 - 345.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
Y. Huang, H. J. Lowe, and W. R. Hersh
A Pilot Study of Contextual UMLS Indexing to Improve the Precision of Concept-based Representation in XML-structured Clinical Radiology Reports
J. Am. Med. Inform. Assoc., November 1, 2003; 10(6): 580 - 587.
[Abstract] [Full Text] [PDF]


Home page
Med Decis MakingHome page
R. Bindels, A. Hasman, J. W. J. van Wersch, P. Pop, and R. A. G. Winkens
The Reliability of Assessing the Appropriateness of Requested Diagnostic Tests
Med Decis Making, January 1, 2003; 23(1): 31 - 37.
[Abstract] [PDF]


Home page
RadiologyHome page
G. Hripcsak, J. H. M. Austin, P. O. Alderson, and C. Friedman
Use of Natural Language Processing to Translate Clinical Information from a Database of 889,921 Chest Radiographic Reports
Radiology, July 1, 2002; 224(1): 157 - 163.
[Abstract] [Full Text]


Home page
J. Am. Med. Inform. Assoc.Home page
G. Hripcsak and A. Wilcox
Reference Standards, Judges, and Comparison Subjects: Roles for Experts in Evaluating System Performance
J. Am. Med. Inform. Assoc., January 1, 2002; 9(1): 1 - 15.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
D. A. Jordan, K. R. McKeown, K. J. Concepcion, S. K. Feiner, and V. Hatzivassiloglou
Generation and Evaluation of Intraoperative Inferences for Automated Health Care Briefings on Patient Status After Bypass Surgery
J. Am. Med. Inform. Assoc., May 1, 2001; 8(3): 267 - 280.
[Abstract] [Full Text]


Home page
J. Am. Med. Inform. Assoc.Home page
M. Fiszman, W. W. Chapman, D. Aronsky, R. S. Evans, and P. J. Haug
Automatic Detection of Acute Bacterial Pneumonia from Chest X-ray Reports
J. Am. Med. Inform. Assoc., November 1, 2000; 7(6): 593 - 604.
[Abstract] [Full Text]


Home page
ChestHome page
T. H. Payne
Computer Decision Support Systems
Chest, August 1, 2000; 118(2_suppl): 47S - 52S.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
C. P. Friedman
Toward a Measured Approach to Medical Informatics
J. Am. Med. Inform. Assoc., March 1, 1999; 6(2): 176 - 177.
[Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 1999 by the American Medical Informatics Association.