| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Submitted on July 12, 2005
Accepted on September 16, 2005
Affiliation of the authors: 1 School of Health Information Sciences, The University of Texas Health Science Center at Houston, Houston, TX; 2 Department of Biomedical Informatics, Vanderbilt University, Nashville, TN; 3 Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR
* To whom correspondence should be addressed.
Objective To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant.
Design and Measurements A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, PageRank and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology.
Results Citation-based algorithms were more effective than non citation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best non-citation based algorithm (p < 0.001). We saw similar differences between citation-based and non citation-based algorithms at 10, 20, 50, 200, 500 and 1000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than non-citation based algorithms.
Conclusion Algorithms which have proven successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.
This article has been cited by other articles:
![]() |
A. Smith, K. Cheung, M. Krauthammer, M. Schultz, and M. Gerstein Leveraging the structure of the Semantic Web to enhance information retrieval for proteomics Bioinformatics, November 15, 2007; 23(22): 3073 - 3079. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Lin, W. Li, K. Chen, and Y. Liu A Document Clustering and Ranking System for Exploring MEDLINE Citations J. Am. Med. Inform. Assoc., September 1, 2007; 14(5): 651 - 661. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Herskovic, L. Y. Tanaka, W. Hersh, and E. V. Bernstam A Day in the Life of PubMed: Analysis of a Typical Day's Query Log J. Am. Med. Inform. Assoc., March 1, 2007; 14(2): 212 - 220. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH |