Quantifying pathogen surveillance using temporal genomic data

Joseph M Chan; Raul Rabadan

doi:10.1128/mBio.00524-12

Quantifying pathogen surveillance using temporal genomic data

mBio. 2013 Jan 29;4(1):e00524-12. doi: 10.1128/mBio.00524-12.

Authors

Joseph M Chan¹, Raul Rabadan

Affiliation

¹ Center for Computational Biology and Bioinformatics, Columbia University College of Physicians and Surgeons, New York, New York, USA. jmc2213@columbia.edu

Abstract

With the advent of deep sequencing, genomic surveillance has become a popular method for detection of infectious disease, supplementing information gathered by classic clinical or serological techniques to identify host-determinant markers and trace the origin of transmission. However, two main factors complicate genomic surveillance. First, pathogens exhibiting high genetic diversity demand higher levels of scrutiny to obtain an accurate representation of the entire population. Second, current systems of detection are nonuniform, with significant gaps in certain geographic locations and animal reservoirs. Despite past unforeseen pandemics like the 2009 swine-origin H1N1 influenza virus, there is no standardized way of evaluating surveillance. A more complete surveillance system should capture a greater proportion of pathogen diversity. Here we present a novel quantitative method of assessing the completeness of genomic surveillance that incorporates the time of sequence collection, as well as the pathogen's evolutionary rate. We propose the q2 coefficient, which measures the proportion of sequenced isolates whose closest neighbor in the past is within a genetic distance equivalent to 2 years of evolution, roughly the median time of changing strain selection for influenza A vaccines. Easily interpretable and significantly faster than other methods, the q2 coefficient requires no full phylogenetic characterization or use of arbitrary clade definitions. Application of the q2 coefficient to influenza A virus confirmed poor sampling of swine and avian populations and identified regions with deficient surveillance. We demonstrate that the q2 coefficient can not only be applied to other pathogens, including dengue and West Nile viruses, but also used to describe surveillance dynamics, particularly the effects of different public health policies.

Importance: Surveillance programs have become key assets in determining the emergence or prevalence of pathogens circulating in human and animal populations. Genomic surveillance, in particular, provides comprehensive information on the history of isolates and potential molecular markers for infectivity and pathogenicity. Current techniques for evaluating genomic surveillance are inaccurate, ignoring the pathogen's evolutionary rate and biodiversity, as well as the timing of sequence collection. Using sequence data, we propose the q2 coefficient as a quantitative measure of surveillance completeness that combines elements of time and evolution without defining arbitrary criteria for clades or species. Through several case studies of influenza A, dengue, and West Nile viruses, we employed the q2 coefficient to identify sampling deficiencies in different host species and locations, as well as examine the effects of different public health policies through historical records of the q2 coefficient. These results can guide public health agencies to focus resource allocation and virus collection to bolster specific problems in surveillance.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Animals
Epidemiological Monitoring*
Evolution, Molecular
Genomics / methods*
Humans
Molecular Epidemiology / methods*
Quality Control*
Time Factors
Virus Diseases / epidemiology
Virus Diseases / virology
Viruses / genetics
Viruses / isolation & purification

Abstract

Publication types

MeSH terms

Grants and funding