help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH

First published April 25, 2007 as JAMIA PrePrint; doi:10.1197/jamia.M2314
Journal of the American Medical Informatics Association 2007;14(4):467-477
© 2007 American Medical Informatics Association


A more recent version of this article appeared on July 1, 2007
This Article
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow All Versions of this Article:
M2314v1
14/4/467    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Fan, J.-W.
Right arrow Articles by Friedman, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fan, J.-W.
Right arrow Articles by Friedman, C.

Submitted on October 25, 2006
Accepted on April 9, 2007

Semantic classification of biomedical concepts using distributional similarity

Jung-Wei Fan MS1* and Carol Friedman PhD1

Affiliation of the authors: 1 Department of Biomedical Informatics, Columbia University in New York, NY

* To whom correspondence should be addressed.

Objective To develop an automated, high-throughput and reproducible method for reclassifying and validating ontological concepts for natural language processing applications.

Design We developed a distributional similarity approach to classify the Unified Medical Language System (UMLS) concepts. Classification models were built for seven broad biomedically relevant semantic classes created by grouping subsets of the UMLS semantic types. We used contextual features based on syntactic properties obtained from two different large corpora and used {alpha}-skew divergence as the similarity measure.

Measurements The testing sets were automatically generated based on the changes by the National Library of Medicine to the semantic classification of concepts from the UMLS 2005AA to the 2006AA release. Error rates were calculated and a misclassification analysis was performed.

Results The estimated lowest error rates were 0.198 and 0.116 when considering the correct classification to be covered by our top prediction and top 2 predictions respectively.

Conclusion The results demonstrated that the distributional similarity approach can recommend high level semantic classification suitable for use in natural language processing.




This article has been cited by other articles:


Home page
BioinformaticsHome page
J.-W. Fan and C. Friedman
Semantic reclassification of the UMLS concepts
Bioinformatics, September 1, 2008; 24(17): 1971 - 1973.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
Copyright © 1994 by the American Medical Informatics Association.