help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published October 26, 2006 as JAMIA PrePrint; doi:10.1197/jamia.M2178
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
M2178v1
14/1/76    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bloom, R. M.
Right arrow Articles by Cheng, K. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bloom, R. M.
Right arrow Articles by Cheng, K. E.
J Am Med Inform Assoc. 2007;14:76-85. DOI 10.1197/jamia.M2178.
© 2007 American Medical Informatics Association


Methods paper

Finding Leading Indicators for Disease Outbreaks: Filtering, Cross-correlation, and Caveats

Ronald M. Bloom, MSa, David L. Buckeridge, MD, PhDb,c,* and Karen E. Cheng, MSa

a Applied Research Associates, Inc., Albuquerque, NM, USA
b McGill Clinical and Health Informatics, Department of Epidemiology and Biostatistics, McGill University, Montreal, Canada
c Institut national de santé publique du Québec, Quebec, Canada

* Correspondence and reprints: David L. Buckeridge, McGill Clinical and Health Informatics, Department of Epidemiology and Biostatistics, McGill University, 1140 Pine Avenue West, Montreal, Quebec H3A 1A3. Tel: (514) 398-8355; Fax: (514) 843-1551. (Email: david.buckeridge{at}mcgill.ca).

Received for publication: 06/12/06; accepted for publication: 10/09/06.

Bioterrorism and emerging infectious diseases such as influenza have spurred research into rapid outbreak detection. One primary thrust of this research has been to identify data sources that provide early indication of a disease outbreak by being leading indicators relative to other established data sources. Researchers tend to rely on the sample cross-correlation function (CCF) to quantify the association between two data sources. There has been, however, little consideration by medical informatics researchers of the influence of methodological choices on the ability of the CCF to identify a lead–lag relationship between time series. We draw on experience from the econometric and environmental health communities, and we use simulation to demonstrate that the sample CCF is highly prone to bias. Specifically, long-scale phenomena tend to overwhelm the CCF, obscuring phenomena at shorter wave lengths. Researchers seeking lead–lag relationships in surveillance data must therefore stipulate the scale length of the features of interest (e.g., short-scale spikes versus long-scale seasonal fluctuations) and then filter the data appropriately—to diminish the influence of other features, which may mask the features of interest. Otherwise, conclusions drawn from the sample CCF of bi-variate time-series data will inevitably be ambiguous and often altogether misleading.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2007 by the American Medical Informatics Association.