| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Submitted on February 15, 2002
Accepted on May 22, 2002
Affiliation of the authors: 1 City University of New York; 2 Columbia University; 3 City University of New York
* To whom correspondence should be addressed.
Motivation The UMLS has been used in natural language processing applications such as information retrieval and information extraction systems. The mapping of free-text to UMLS concepts is important for these applications. In order to improve the mapping, we need a method to disambiguate terms that possess multiple UMLS concepts. In the general English domain, machine-learning techniques have been applied to sense-tagged corpora, where senses (or concepts) of ambiguous terms have been annotated (mostly manually). Sense disambiguation classifiers are then derived to determine senses (or concepts) of those ambiguous terms automatically. However, manual annotation of a corpus is an expensive task. We propose an automatic method that constructs sense-tagged corpora for ambiguous terms in the UMLS using MEDLINE abstracts.
Methods Methods For a term W that represents multiple UMLS concepts, a collection of MEDLINE abstracts that contain W is extracted. For each abstract in the collection, occurrences of concepts that have relations with W as defined in the UMLS are automatically identified. A sense-tagged corpus, where senses of W are annotated, is then derived based on those identified concepts. The method was evaluated on a set of 35 frequently occurring ambiguous biomedical abbreviations using a gold standard set that was automatically derived. The quality of the derived sense-tagged corpus was measured using precision and recall.
Results The derived sense-tagged corpus had an overall precision of 92.9% and an overall recall of 47.4%. After removing rare senses and ignoring abbreviations with closely related senses, the overall precision was 96.8% and the overall recall was 50.6%.
Conclusions UMLS conceptual relations and MEDLINE abstracts can be used to automatically acquire knowledge needed for resolving ambiguity when mapping free-text to UMLS concepts.
This article has been cited by other articles:
![]() |
H. Xu, J.-W. Fan, G. Hripcsak, E. A. Mendonca, M. Markatou, and C. Friedman Gene symbol disambiguation using knowledge-based profiles Bioinformatics, April 15, 2007; 23(8): 1015 - 1022. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Liu, Z.-Z. Hu, M. Torii, C. Wu, and C. Friedman Quantitative Assessment of Dictionary-based Protein Named Entity Tagging J. Am. Med. Inform. Assoc., September 1, 2006; 13(5): 497 - 507. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Dinakarpandian, Y. Lee, K. Vishwanath, and R. Lingambhotla MachineProse: an Ontological Framework for Scientific Assertions J. Am. Med. Inform. Assoc., March 1, 2006; 13(2): 220 - 232. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Gaudan, H. Kirsch, and D. Rebholz-Schuhmann Resolving abbreviations to their senses in Medline Bioinformatics, September 15, 2005; 21(18): 3658 - 3664. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Hofmann and D. Schomburg Concept-based annotation of enzyme classes Bioinformatics, May 1, 2005; 21(9): 2059 - 2066. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Friedman, L. Shagina, Y. Lussier, and G. Hripcsak Automated Encoding of Clinical Documents Based on Natural Language Processing J. Am. Med. Inform. Assoc., September 1, 2004; 11(5): 392 - 402. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Liu, V. Teller, and C. Friedman A Multi-aspect Comparison Study of Supervised Word Sense Disambiguation J. Am. Med. Inform. Assoc., July 1, 2004; 11(4): 320 - 331. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH |