help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published April 18, 2006 as JAMIA PrePrint; doi:10.1197/jamia.M2031
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
M2031v1
13/4/446    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Aphinyanaphongs, Y.
Right arrow Articles by Aliferis, C. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Aphinyanaphongs, Y.
Right arrow Articles by Aliferis, C. F.
J Am Med Inform Assoc. 2006;13:446-455. DOI 10.1197/jamia.M2031.
© 2006 American Medical Informatics Association


Research Paper

A Comparison of Citation Metrics to Machine Learning Filters for the Identification of High Quality MEDLINE Documents

Yindalon Aphinyanaphongs, MS, MS, Alexander Statnikov, MS, MS and Constantin F. Aliferis, MD, PhD*

Discovery Systems Laboratory, Department of Biomedical Informatics, Vanderbilt University, Nashville, TN

* Correspondence and reprints: Constantin F. Aliferis, MD, PhD, Department of Biomedical Informatics, Eskind Biomedical Library, room 412, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232. (Email: constantin.aliferis{at}vanderbilt.edu).

Received for publication: 12/07/05; accepted for publication: 03/20/06.

OBJECTIVE: The present study explores the discriminatory performance of existing and novel gold-standard-specific machine learning (GSS-ML) focused filter models (i.e., models built specifically for a retrieval task and a gold standard against which they are evaluated) and compares their performance to citation count and impact factors, and non-specific machine learning (NS-ML) models (i.e., models built for a different task and/or different gold standard).

DESIGN: Three gold standard corpora were constructed using the SSOAB bibliography, the ACPJ-cited treatment articles, and the ACPJ-cited etiology articles. Citation counts and impact factors were obtained for each article. Support vector machine models were used to classify the articles using combinations of content, impact factors, and citation counts as predictors.

MEASUREMENTS: Discriminatory performance was estimated using the area under the receiver operating characteristic curve and n-fold cross-validation.

RESULTS: For all three gold standards and tasks, GSS-ML filters outperformed citation count, impact factors, and NS-ML filters. Combinations of content with impact factor or citation count produced no or negligible improvements to the GSS machine learning filters.

CONCLUSIONS: These experiments provide evidence that when building information retrieval filters focused on a retrieval task and corresponding gold standard, the filter models have to be built specifically for this task and gold standard. Under those conditions, machine learning filters outperform standard citation metrics. Furthermore, citation counts and impact factors add marginal value to discriminatory performance. Previous research that claimed better performance of citation metrics than machine learning in one of the corpora examined here is attributed to using machine learning filters built for a different gold standard and task.




This article has been cited by other articles:


Home page
BioinformaticsHome page
A. Smith, K. Cheung, M. Krauthammer, M. Schultz, and M. Gerstein
Leveraging the structure of the Semantic Web to enhance information retrieval for proteomics
Bioinformatics, November 15, 2007; 23(22): 3073 - 3079.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2006 by the American Medical Informatics Association.