Transmission bottleneck size estimation from de novo viral genetic variation

bioRxiv [Preprint]. 2023 Aug 14:2023.08.14.553219. doi: 10.1101/2023.08.14.553219.

Abstract

Sequencing of viral infections has become increasingly common over the last decade. Deep sequencing data in particular have proven useful in characterizing the roles that genetic drift and natural selection play in shaping within-host viral populations. They have also been used to estimate transmission bottleneck sizes from identified donor-recipient pairs. These bottleneck sizes quantify the number of viral particles that establish genetic lineages in the recipient host and are important to estimate due to their impact on viral evolution. Current approaches for estimating bottleneck sizes exclusively consider the subset of viral sites that are observed as polymorphic in the donor individual. However, allele frequencies can change dramatically over the course of an individual's infection, such that sites that are polymorphic in the donor at the time of transmission may not be polymorphic in the donor at the time of sampling and allele frequencies at donor-polymorphic sites may change dramatically over the course of a recipient's infection. Because of this, transmission bottleneck sizes estimated using allele frequencies observed at a donor's polymorphic sites may be considerable underestimates of true bottleneck sizes. Here, we present a new statistical approach for instead estimating bottleneck sizes using patterns of viral genetic variation that arose de novo within a recipient individual. Specifically, our approach makes use of the number of clonal viral variants observed in a transmission pair, defined as the number of viral sites that are monomorphic in both the donor and the recipient but carry different alleles. We first test our approach on a simulated dataset and then apply it to both influenza A virus sequence data and SARS-CoV-2 sequence data from identified transmission pairs. Our results confirm the existence of extremely tight transmission bottlenecks for these two respiratory viruses, using an approach that does not tend to underestimate transmission bottleneck sizes.

Publication types

  • Preprint