help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published July 23, 2002 as JAMIA PrePrint; doi:10.1197/jamia.M1101
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
M1101v1
9/6/621    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Liu, H.
Right arrow Articles by Friedman, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Liu, H.
Right arrow Articles by Friedman, C.
J Am Med Inform Assoc. 2002;9:621-636. DOI 10.1197/jamia.M1101.
© 2002 American Medical Informatics Association


Research Paper

Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS

Hongfang Liu, MS, Stephen B. Johnson, PhD and Carol Friedman, PhD

Affiliations of the authors: City University of New York, New York, New York (HL); Columbia University, New York, New York (SBJ, CF).

Correspondence and reprints: Hongfang Liu, MS, Department of Medical Informatics, Columbia University, VC-5, 622 W. 168 Street, New York, NY 10032; e-mail: <hongfang.liu{at}dmi.columbia.edu>.

Received for publication: 02/15/02; accepted for publication: 05/22/02.

Motivation. The UMLS has been used in natural language processing applications such as information retrieval and information extraction systems. The mapping of free-text to UMLS concepts is important for these applications. To improve the mapping, we need a method to disambiguate terms that possess multiple UMLS concepts. In the general English domain, machine-learning techniques have been applied to sense-tagged corpora, in which senses (or concepts) of ambiguous terms have been annotated (mostly manually). Sense disambiguation classifiers are then derived to determine senses (or concepts) of those ambiguous terms automatically. However, manual annotation of a corpus is an expensive task. We propose an automatic method that constructs sense-tagged corpora for ambiguous terms in the UMLS using MEDLINE abstracts.

Methods. For a term W that represents multiple UMLS concepts, a collection of MEDLINE abstracts that contain W is extracted. For each abstract in the collection, occurrences of concepts that have relations with W as defined in the UMLS are automatically identified. A sense-tagged corpus, in which senses of W are annotated, is then derived based on those identified concepts. The method was evaluated on a set of 35 frequently occurring ambiguous biomedical abbreviations using a gold standard set that was automatically derived. The quality of the derived sense-tagged corpus was measured using precision and recall.

Results. The derived sense-tagged corpus had an overall precision of 92.9% and an overall recall of 47.4%. After removing rare senses and ignoring abbreviations with closely related senses, the overall precision was 96.8% and the overall recall was 50.6%.

Conclusions. UMLS conceptual relations and MEDLINE abstracts can be used to automatically acquire knowledge needed for resolving ambiguity when mapping free-text to UMLS concepts.




This article has been cited by other articles:


Home page
BioinformaticsHome page
H. Xu, J.-W. Fan, G. Hripcsak, E. A. Mendonca, M. Markatou, and C. Friedman
Gene symbol disambiguation using knowledge-based profiles
Bioinformatics, April 15, 2007; 23(8): 1015 - 1022.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
H. Liu, Z.-Z. Hu, M. Torii, C. Wu, and C. Friedman
Quantitative Assessment of Dictionary-based Protein Named Entity Tagging
J. Am. Med. Inform. Assoc., September 1, 2006; 13(5): 497 - 507.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
D. Dinakarpandian, Y. Lee, K. Vishwanath, and R. Lingambhotla
MachineProse: an Ontological Framework for Scientific Assertions
J. Am. Med. Inform. Assoc., March 1, 2006; 13(2): 220 - 232.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Gaudan, H. Kirsch, and D. Rebholz-Schuhmann
Resolving abbreviations to their senses in Medline
Bioinformatics, September 15, 2005; 21(18): 3658 - 3664.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
O. Hofmann and D. Schomburg
Concept-based annotation of enzyme classes
Bioinformatics, May 1, 2005; 21(9): 2059 - 2066.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
C. Friedman, L. Shagina, Y. Lussier, and G. Hripcsak
Automated Encoding of Clinical Documents Based on Natural Language Processing
J. Am. Med. Inform. Assoc., September 1, 2004; 11(5): 392 - 402.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
H. Liu, V. Teller, and C. Friedman
A Multi-aspect Comparison Study of Supervised Word Sense Disambiguation
J. Am. Med. Inform. Assoc., July 1, 2004; 11(4): 320 - 331.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2002 by the American Medical Informatics Association.