help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published November 23, 2004 as JAMIA PrePrint; doi:10.1197/jamia.M1640
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow An erratum has been published
Right arrow All Versions of this Article:
M1640v1
12/2/121    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Rubin, D. L.
Right arrow Articles by Altman, R. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rubin, D. L.
Right arrow Articles by Altman, R. B.
J Am Med Inform Assoc. 2005;12:121-129. DOI 10.1197/jamia.M1640.
© 2005 American Medical Informatics Association


Application of Information Technology

A Statistical Approach to Scanning the Biomedical Literature for Pharmacogenetics Knowledge

Daniel L. Rubin, MD, MS, Caroline F. Thorn, PhD, Teri E. Klein, PhD and Russ B. Altman, MD, PhD

Affiliations of the authors: Section of Medical Informatics, Stanford University, Stanford, CA (DLR, TEK, RBA); Department of Genetics, Stanford Medical Center, Stanford, CA (CFT, TEK, RBA).

Correspondence and reprints: Daniel L. Rubin, MD, MS, Section of Medical Informatics, MSOB X-215, Stanford, CA 94305; e-mail: <rubin{at}smi.stanford.edu>.

Received for publication: 06/16/04; accepted for publication: 10/20/04.

Objective: Biomedical databases summarize current scientific knowledge, but they generally require years of laborious curation effort to build, focusing on identifying pertinent literature and data in the voluminous biomedical literature. It is difficult to manually extract useful information embedded in the large volumes of literature, and automated intelligent text analysis tools are becoming increasingly essential to assist in these curation activities. The goal of the authors was to develop an automated method to identify articles in Medline citations that contain pharmacogenetics data pertaining to gene–drug relationships.

Design: The authors built and evaluated several candidate statistical models that characterize pharmacogenetics articles in terms of word usage and the profile of Medical Subject Headings (MeSH) used in those articles. The best-performing model was used to scan the entire Medline article database (11 million articles) to identify candidate pharmacogenetics articles.

Results: A sampling of the articles identified from scanning Medline was reviewed by a pharmacologist to assess the precision of the method. The authors' approach identified 4,892 pharmacogenetics articles in the literature with 92% precision. Their automated method took a fraction of the time to acquire these articles compared with the time expected to be taken to accumulate them manually. The authors have built a Web resource (http://pharmdemo.stanford.edu/pharmdb/main.spy) to provide access to their results.

Conclusion: A statistical classification approach can screen the primary literature to pharmacogenetics articles with high precision. Such methods may assist curators in acquiring pertinent literature in building biomedical databases.




This article has been cited by other articles:


Home page
Brief BioinformHome page
P. Agarwal and D. B. Searls
Literature mining in support of drug discovery
Brief Bioinform, September 27, 2008; (2008) bbn035v1.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
D. L. Rubin, N. H. Shah, and N. F. Noy
Biomedical ontologies: a functional perspective
Brief Bioinform, January 1, 2008; 9(1): 75 - 90.
[Abstract] [Full Text] [PDF]


Home page
Am J EpidemiolHome page
B. K. Lin, M. Clyne, M. Walsh, O. Gomez, W. Yu, M. Gwinn, and M. J. Khoury
Tracking the Epidemiology of Human Genes in the Literature: The HuGE Published Literature Database
Am. J. Epidemiol., July 1, 2006; 164(1): 1 - 4.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Goetz and C.-W. von der Lieth
PubFinder: a tool for improving retrieval rate of relevant PubMed abstracts
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W774 - W778.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2005 by the American Medical Informatics Association.