Evaluating the number of different genomes in a metagenome by means of the compositional spectra approach

PLoS One. 2020 Nov 6;15(11):e0237205. doi: 10.1371/journal.pone.0237205. eCollection 2020.

Abstract

Determination of metagenome composition is still one of the most interesting problems of bioinformatics. It involves a wide range of mathematical methods, from probabilistic models of combinatorics to cluster analysis and pattern recognition techniques. The successful advance of rapid sequencing methods and fast and precise metagenome analysis will increase the diagnostic value of healthy or pathological human metagenomes. The article presents the theoretical foundations of the algorithm for calculating the number of different genomes in the medium under study. The approach is based on analysis of the compositional spectra of subsequently sequenced samples of the medium. Its essential feature is using random fluctuations in the bacteria number in different samples of the same metagenome. The possibility of effective implementation of the algorithm in the presence of data errors is also discussed. In the work, the algorithm of a metagenome evaluation is described, including the estimation of the genome number and the identification of the genomes with known compositional spectra. It should be emphasized that evaluating the genome number in a metagenome can be always helpful, regardless of the metagenome separation techniques, such as clustering the sequencing results or marker analysis.

Publication types

  • Evaluation Study

MeSH terms

  • Algorithms*
  • Bacteria / classification*
  • Bacteria / genetics*
  • Computational Biology / methods*
  • Humans
  • Metagenome*
  • Phylogeny
  • Sequence Analysis, DNA / methods*

Grants and funding

We did not receive any funding or grants for financial or material support of this study from any organizations, including our institutions. No authors received a salary from any funders. The authors received no specific funding for this work.