| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Paper |
Affiliation of the authors: Ariadne Genomics, Inc., Rockville, MD.
Correspondence and reprints: Nikolai Daraselia, PhD, Ariadne Genomics, Inc., 9700 Great Seneca Highway, Rockville, MD 20850; e-mail: <nikolai{at}ariadnegenomics.com>.
Received for publication: 09/08/03; accepted for publication: 01/11/04.
Objective: The aim of this study was to develop a practical and efficient protein identification system for biomedical corpora.
Design: The developed system, called ProtScan, utilizes a carefully constructed dictionary of mammalian proteins in conjunction with a specialized tokenization algorithm to identify and tag protein name occurrences in biomedical texts and also takes advantage of Medline "Name-of-Substance" (NOS) annotation. The dictionaries for ProtScan were constructed in a semi-automatic way from various public-domain sequence databases followed by an intensive expert curation step.
Measurements: The recall and precision of the system have been determined using 1,000 randomly selected and hand-tagged Medline abstracts.
Results: The developed system is capable of identifying protein occurrences in Medline abstracts with a 98% precision and 88% recall. It was also found to be capable of processing approximately 300 abstracts per second. Without utilization of NOS annotation, precision and recall were found to be 98.5% and 84%, respectively.
Conclusion: The developed system appears to be well suited for protein-based Medline indexing and can help to improve biomedical information retrieval. Further approaches to ProtScan's recall improvement also are discussed.
This article has been cited by other articles:
![]() |
W. Zhou, V. I. Torvik, and N. R. Smalheiser ADAM: another database of abbreviations in MEDLINE Bioinformatics, November 15, 2006; 22(22): 2813 - 2818. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |