help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published August 21, 2007 as JAMIA PrePrint; doi:10.1197/jamia.M2080
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow All Versions of this Article:
M2080v1
14/6/788    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bales, M. E.
Right arrow Articles by Johnson, S. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bales, M. E.
Right arrow Articles by Johnson, S. B.
J Am Med Inform Assoc. 2007;14:788-797. DOI 10.1197/jamia.M2080.
© 2007 American Medical Informatics Association


Research Paper

Topological Analysis of Large-scale Biomedical Terminology Structures

Michael E. Bales, MPHa, Yves A. Lussier, MD1,b and Stephen B. Johnson, PhD1,a,*

a Department of Biomedical Informatics, Columbia University, New York, NY
b Department of Medicine, University of Chicago, Chicago, IL.

* Correspondence: Stephen Johnson, Department of Biomedical Informatics, Columbia University, Vanderbilt Clinic, 5th Floor, 622 West 168th Street, New York, NY 10032 (Email: stephen.johnson{at}dbmi.columbia.edu).

Received for publication: 02/10/06; accepted for publication: 07/26/07.

Objective: To characterize global structural features of large-scale biomedical terminologies using currently emerging statistical approaches.

Design: Given rapid growth of terminologies, this research was designed to address scalability. We selected 16 terminologies covering a variety of domains from the UMLS Metathesaurus, a collection of terminological systems. Each was modeled as a network in which nodes were atomic concepts and links were relationships asserted by the source vocabulary. For comparison against each terminology we created three random networks of equivalent size and density.

Measurements: Average node degree, node degree distribution, clustering coefficient, average path length.

Results: Eight of 16 terminologies exhibited the small-world characteristics of a short average path length and strong local clustering. An overlapping subset of nine exhibited a power law distribution in node degrees, indicative of a scale-free architecture. We attribute these features to specific design constraints. Constraints on node connectivity, common in more synthetic classification systems, localize the effects of changes and deletions. In contrast, small-world and scale-free features, common in comprehensive medical terminologies, promote flexible navigation and less restrictive organic-like growth.

Conclusion: While thought of as synthetic, grid-like structures, some controlled terminologies are structurally indistinguishable from natural language networks. This paradoxical result suggests that terminology structure is shaped not only by formal logic-based semantics, but by rules analogous to those that govern social networks and biological systems. Graph theoretic modeling shows early promise as a framework for describing terminology structure. Deeper understanding of these techniques may inform the development of scalable terminologies and ontologies.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2007 by the American Medical Informatics Association.