Statistical and Computational Methods for Microbial Strain Analysis

Methods Mol Biol. 2023:2629:231-245. doi: 10.1007/978-1-0716-2986-4_11.

Abstract

Microbial strains are interpreted as a lineage derived from a recent ancestor that have not experienced "too many" recombination events and can be successfully retrieved with culture-independent techniques using metagenomic sequencing. Such a strain variability has been increasingly shown to display additional phenotypic heterogeneities that affect host health, such as virulence, transmissibility, and antibiotics resistance. New statistical and computational methods have recently been developed to track the strains in samples based on shotgun metagenomics data either based on reference genome sequences or Metagenome-assembled genomes (MAGs). In this paper, we review some recent statistical methods for strain identifications based on frequency counts at a set of single nucleotide variants (SNVs) within a set of single-copy marker genes. These methods differ in terms of whether reference genome sequences are needed, how SNVs are called, what methods of deconvolution are used and whether the methods can be applied to multiple samples. We conclude our review with areas that require further research.

Keywords: Deconvolution; Genetic polymorphisms; Genome assembly; Haplotype; Strain typing.

Publication types

  • Review

MeSH terms

  • Metagenome
  • Metagenomics / methods
  • Microbiota* / genetics
  • Sequence Analysis, DNA / methods