VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution

Microbiome. 2023 Nov 8;11(1):247. doi: 10.1186/s40168-023-01705-9.

Abstract

Background: Phylogenomic analysis has become an inseparable part of studies of bacterial diversity and evolution, and many different bacterial core genes have been collated and used for phylogenomic tree reconstruction. However, these genes have been selected based on their presence and single-copy ratio in all bacterial genomes, leaving out the gene's 'phylogenetic fidelity' unexamined.

Results: From 30,522 complete genomes covering 11,262 species, we examined 148 bacterial core genes that have been previously used for phylogenomic analysis. In addition to the gene presence and single-copy rations, we evaluated the gene's phylogenetic fidelity by comparing each gene's phylogeny with its corresponding 16S rRNA gene tree. Out of the 148 bacterial genes, 20 validated bacterial core genes (VBCG) were selected as the core gene set with the highest bacterial phylogenetic fidelity. Compared to the larger gene set, the 20-gene core set resulted in more species having all genes present and fewer species with missing data, thereby enhancing the accuracy of phylogenomic analysis. Using Escherichia coli strains as examples of prominent bacterial foodborne pathogens, we demonstrated that the 20 VBCG produced phylogenies with higher fidelity and resolution at species and strain levels while 16S rRNA gene tree alone could not.

Conclusion: The 20 validated core gene set improves the fidelity and speed of phylogenomic analysis. Among other uses, this tool improves our ability to explore the evolution, typing and tracking of bacterial strains, such as human pathogens. We have developed a Python pipeline and a desktop graphic app (available on GitHub) for users to perform phylogenomic analysis with high fidelity and resolution. Video Abstract.

Keywords: Bacterial core genes; Pathogen typing; Phylogenetic tree; Phylogenomics.

Publication types

  • Video-Audio Media
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Bacteria / genetics
  • Genes, Bacterial* / genetics
  • Genome, Bacterial* / genetics
  • Humans
  • Phylogeny
  • RNA, Ribosomal, 16S / genetics

Substances

  • RNA, Ribosomal, 16S