help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published June 28, 2007 as JAMIA PrePrint; doi:10.1197/jamia.M2215
This Article
Right arrow Full Text Free
Right arrow Full Text (PDF) Free
Right arrow Data Supplement
Right arrow All Versions of this Article:
M2215v1
14/5/651    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Lin, Y.
Right arrow Articles by Liu, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lin, Y.
Right arrow Articles by Liu, Y.
J Am Med Inform Assoc. 2007;14:651-661. DOI 10.1197/jamia.M2215.
© 2007 American Medical Informatics Association


Model Formulation

A Document Clustering and Ranking System for Exploring MEDLINE Citations

Yongjing Lin, MSa, Wenyuan Li, PhDa, Keke Chen, PhDc and Ying Liu, PhDa,b,*

a Laboratory for Bioinformatics and Medical Informatics, Department of Computer Science, University of Texas at Dallas, Richardson, TX
b Department of Molecular and Cell Biology, University of Texas at Dallas, Richardson, TX
c Yahoo!, Santa Clara, CA.

* Correspondence and reprints: Ying Liu, PhD, Department of Computer Science, P.O. Box 830688; MS EC31, University of Texas at Dallas, Richardson, TX 75083-0688 (Email: ying.liu{at}utdallas.edu).

Received for publication: 07/17/06; accepted for publication: 05/20/07.

Objective: A major problem faced in biomedical informatics involves how best to present information retrieval results. When a single query retrieves many results, simply showing them as a long list often provides poor overview. With a goal of presenting users with reduced sets of relevant citations, this study developed an approach that retrieved and organized MEDLINE citations into different topical groups and prioritized important citations in each group.

Design: A text mining system framework for automatic document clustering and ranking organized MEDLINE citations following simple PubMed queries. The system grouped the retrieved citations, ranked the citations in each cluster, and generated a set of keywords and MeSH terms to describe the common theme of each cluster.

Measurements: Several possible ranking functions were compared, including citation count per year (CCPY), citation count (CC), and journal impact factor (JIF). We evaluated this framework by identifying as "important" those articles selected by the Surgical Oncology Society.

Results: Our results showed that CCPY outperforms CC and JIF, i.e., CCPY better ranked important articles than did the others. Furthermore, our text clustering and knowledge extraction strategy grouped the retrieval results into informative clusters as revealed by the keywords and MeSH terms extracted from the documents in each cluster.

Conclusions: The text mining system studied effectively integrated text clustering, text summarization, and text ranking and organized MEDLINE retrieval results into different topical groups.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2007 by the American Medical Informatics Association.