help button home button JAMIA Bigger figures
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Johnson, S. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Johnson, S. B.
Journal of the American Medical Informatics Association 6:205-218 (1999)
© 1999 American Medical Informatics Association


Research Paper

A Semantic Lexicon for Medical Language Processing

Stephen B. Johnson, PhD

Columbia University, New York, New York.

Correspondence and reprints: Stephen B. Johnson, PhD, Department of Medical Informatics, Columbia University, 161 Fort Washington Avenue, DAP-1310, New York, NY 10032. e-mail: <stephen.johnson{at}columbia.edu >.

Objective: Construction of a resource that provides semantic information about words and phrases to facilitate the computer processing of medical narrative.

Design: Lexemes (words and word phrases) in the Specialist Lexicon were matched against strings in the 1997 Metathesaurus of the Unified Medical Language System (UMLS) developed by the National Library of Medicine. This yielded a "semantic lexicon," in which each lexeme is associated with one or more syntactic types, each of which can have one or more semantic types. The semantic lexicon was then used to assign semantic types to lexemes occurring in a corpus of discharge summaries (603,306 sentences). Lexical items with multiple semantic types were examined to determine whether some of the types could be eliminated, on the basis of usage in discharge summaries. A concordance program was used to find contrasting contexts for each lexeme that would reflect different semantic senses. Based on this evidence, semantic preference rules were developed to reduce the number of lexemes with multiple semantic types.

Results: Matching the Specialist Lexicon against the Metathesaurus produced a semantic lexicon with 75,711 lexical forms, 22,805 (30.1 percent) of which had two or more semantic types. Matching the Specialist Lexicon against one year's worth of discharge summaries identified 27,633 distinct lexical forms, 13,322 of which had at least one semantic type. This suggests that the Specialist Lexicon has about 79 percent coverage for syntactic information and 38 percent coverage for semantic information for discharge summaries. Of those lexemes in the corpus that had semantic types, 3,474 (12.6 percent) had two or more types. When semantic preference rules were applied to the semantic lexicon, the number of entries with multiple semantic types was reduced to 423 (1.5 percent). In the discharge summaries, occurrences of lexemes with multiple semantic types were reduced from 9.41 to 1.46 percent.

Conclusion: Automatic methods can be used to construct a semantic lexicon from existing UMLS sources. This semantic information can aid natural language processing programs that analyze medical narrative, provided that lexemes with multiple semantic types are kept to a minimum. Semantic preference rules can be used to select semantic types that are appropriate to clinical reports. Further work is needed to increase the coverage of the semantic lexicon and to exploit contextual information when selecting semantic senses.




This article has been cited by other articles:


Home page
J. Am. Med. Inform. Assoc.Home page
J. C. Denny, J. D. Smithers, R. A. Miller, and A. Spickard III
"Understanding" Medical School Curriculum Content Using KnowledgeMap
J. Am. Med. Inform. Assoc., July 1, 2003; 10(4): 351 - 362.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
H. Liu, S. B. Johnson, and C. Friedman
Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS
J. Am. Med. Inform. Assoc., November 1, 2002; 9(6): 621 - 636.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
B. L. Humphreys
Electronic Health Record Meets Digital Library: A New Environment for Achieving an Old Goal
J. Am. Med. Inform. Assoc., September 1, 2000; 7(5): 444 - 452.
[Abstract] [Full Text]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 1999 by the American Medical Informatics Association.