help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published July 23, 2002 as JAMIA PrePrint; doi:10.1197/jamia.M1139
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
M1139v1
9/6/612    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Chang, J. T.
Right arrow Articles by Altman, R. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chang, J. T.
Right arrow Articles by Altman, R. B.
Journal of the American Medical Informatics Association 9:612-620 (2002)
© 2002 American Medical Informatics Association


Research Paper

Creating an Online Dictionary of Abbreviations from MEDLINE

Jeffrey T. Chang, Hinrich Schütze, PhD and Russ B. Altman, MD, PhD

Affiliations of the authors: Department of Genetics, Stanford Medical Informatics, Stanford, California (JTC, RBA); Novation Biosciences, Stanford, California (HS).

Correspondence and reprints: Russ B. Altman, MD, PhD, Depart-ment of Genetics, Stanford Medical Informatics, Stanford School of Medicine, Medical School Office Building, X-215, 251 Campus Dr., Stanford, CA 94305; e-mail: <russ.altman{at}stanford.edu>.

Objective. The growth of the biomedical literature presents special challenges for both human readers and automatic algorithms. One such challenge derives from the common and uncontrolled use of abbreviations in the literature. Each additional abbreviation increases the effective size of the vocabulary for a field. Therefore, to create an automatically generated and maintained lexicon of abbreviations, we have developed an algorithm to match abbreviations in text with their expansions.

Design. Our method uses a statistical learning algorithm, logistic regression, to score abbreviation expansions based on their resemblance to a training set of human-annotated abbreviations. We applied it to Medstract, a corpus of MEDLINE abstracts in which abbreviations and their expansions have been manually annotated. We then ran the algorithm on all abstracts in MEDLINE, creating a dictionary of biomedical abbreviations. To test the coverage of the database, we used an independently created list of abbreviations from the China Medical Tribune.

Measurements. We measured the recall and precision of the algorithm in identifying abbreviations from the Medstract corpus. We also measured the recall when searching for abbreviations from the China Medical Tribune against the database.

Results. On the Medstract corpus, our algorithm achieves up to 83% recall at 80% precision. Applying the algorithm to all of MEDLINE yielded a database of 781,632 high-scoring abbreviations. Of all the abbreviations in the list from the China Medical Tribune, 88% were in the database.

Conclusion. We have developed an algorithm to identify abbreviations from text. We are making this available as a public abbreviation server at \url{http://abbreviation.stanford.edu/}.




This article has been cited by other articles:


Home page
BioinformaticsHome page
W. Zhou, V. I. Torvik, and N. R. Smalheiser
ADAM: another database of abbreviations in MEDLINE
Bioinformatics, November 15, 2006; 22(22): 2813 - 2818.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
J. Y. Sun and Y. Sun
A System for Automated Lexical Mapping
J. Am. Med. Inform. Assoc., May 1, 2006; 13(3): 334 - 343.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
H. Ao and T. Takagi
ALICE: An Algorithm to Extract Abbreviations from MEDLINE
J. Am. Med. Inform. Assoc., September 1, 2005; 12(5): 576 - 586.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. D. Wren, J. T. Chang, J. Pustejovsky, E. Adar, H. R. Garner, and R. B. Altman
Biomedical term mapping databases
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D289 - D293.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
S. Egorov, A. Yuryev, and N. Daraselia
A Simple and Practical Dictionary-based Approach for Identification of Proteins in Medline Abstracts
J. Am. Med. Inform. Assoc., May 1, 2004; 11(3): 174 - 178.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2002 by the American Medical Informatics Association.