Sample size calculation for phylogenetic case linkage

PLoS Comput Biol. 2021 Jul 6;17(7):e1009182. doi: 10.1371/journal.pcbi.1009182. eCollection 2021 Jul.

Abstract

Sample size calculations are an essential component of the design and evaluation of scientific studies. However, there is a lack of clear guidance for determining the sample size needed for phylogenetic studies, which are becoming an essential part of studying pathogen transmission. We introduce a statistical framework for determining the number of true infector-infectee transmission pairs identified by a phylogenetic study, given the size and population coverage of that study. We then show how characteristics of the criteria used to determine linkage and aspects of the study design can influence our ability to correctly identify transmission links, in sometimes counterintuitive ways. We test the overall approach using outbreak simulations and provide guidance for calculating the sensitivity and specificity of the linkage criteria, the key inputs to our approach. The framework is freely available as the R package phylosamp, and is broadly applicable to designing and evaluating a wide array of pathogen phylogenetic studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / classification
  • Bacteria / genetics
  • Computational Biology / methods*
  • Genetic Linkage / genetics
  • Humans
  • Infections / microbiology
  • Infections / transmission
  • Infections / virology
  • Phylogeny*
  • Research Design
  • Sample Size*
  • Sensitivity and Specificity
  • Viruses / classification
  • Viruses / genetics

Grants and funding

Funding was provided by Bill and Melinda Gates Foundation OPP1195157 (S.W. and J.L.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.