Unique k-mers as Strain-Specific Barcodes for Phylogenetic Analysis and Natural Microbiome Profiling

Int J Mol Sci. 2020 Jan 31;21(3):944. doi: 10.3390/ijms21030944.

Abstract

The need for a comparative analysis of natural metagenomes stimulated the development of new methods for their taxonomic profiling. Alignment-free approaches based on the search for marker k-mers turned out to be capable of identifying not only species, but also strains of microorganisms with known genomes. Here, we evaluated the ability of genus-specific k-mers to distinguish eight phylogroups of Escherichia coli (A, B1, C, E, D, F, G, B2) and assessed the presence of their unique 22-mers in clinical samples from microbiomes of four healthy people and four patients with Crohn's disease. We found that a phylogenetic tree inferred from the pairwise distance matrix for unique 18-mers and 22-mers of 124 genomes was fully consistent with the topology of the tree, obtained with concatenated aligned sequences of orthologous genes. Therefore, we propose strain-specific "barcodes" for rapid phylotyping. Using unique 22-mers for taxonomic analysis, we detected microbes of all groups in human microbiomes; however, their presence in the five samples was significantly different. Pointing to the intraspecies heterogeneity of E. coli in the natural microflora, this also indicates the feasibility of further studies of the role of this heterogeneity in maintaining population homeostasis.

Keywords: alignment-free algorithms; bacterial genomes; genome barcodes; human microbiome; k-mers; metagenomes; phylogenetic trees; phylotyping; taxonomic profiling.

MeSH terms

  • Algorithms
  • Case-Control Studies
  • Computational Biology
  • Crohn Disease / genetics*
  • Crohn Disease / microbiology
  • DNA Barcoding, Taxonomic / methods*
  • Escherichia coli / classification
  • Escherichia coli / genetics*
  • Escherichia coli / isolation & purification
  • Escherichia coli Infections / genetics*
  • Escherichia coli Infections / microbiology
  • Genes, Bacterial*
  • Genome, Bacterial*
  • Humans
  • Metagenome
  • Microbiota*