Analysis of 1,000+ Type-Strain Genomes Substantially Improves Taxonomic Classification of Alphaproteobacteria

Front Microbiol. 2020 Apr 7:11:468. doi: 10.3389/fmicb.2020.00468. eCollection 2020.

Abstract

The class Alphaproteobacteria is comprised of a diverse assemblage of Gram-negative bacteria that includes organisms of varying morphologies, physiologies and habitat preferences many of which are of clinical and ecological importance. Alphaproteobacteria classification has proved to be difficult, not least when taxonomic decisions rested heavily on a limited number of phenotypic features and interpretation of poorly resolved 16S rRNA gene trees. Despite progress in recent years regarding the classification of bacteria assigned to the class, there remains a need to further clarify taxonomic relationships. Here, draft genome sequences of a collection of genomes of more than 1000 Alphaproteobacteria and outgroup type strains were used to infer phylogenetic trees from genome-scale data using the principles drawn from phylogenetic systematics. The majority of taxa were found to be monophyletic but several orders, families and genera, including taxa recognized as problematic long ago but also quite recent taxa, as well as a few species were shown to be in need of revision. According proposals are made for the recognition of new orders, families and genera, as well as the transfer of a variety of species to other genera and of a variety of genera to other families. In addition, emended descriptions are given for many species mainly involving information on DNA G+C content and (approximate) genome size, both of which are confirmed as valuable taxonomic markers. Similarly, analysis of the gene content was shown to provide valuable taxonomic insights in the class. Significant incongruities between 16S rRNA gene and whole genome trees were not found in the class. The incongruities that became obvious when comparing the results of the present study with existing classifications appeared to be caused mainly by insufficiently resolved 16S rRNA gene trees or incomplete taxon sampling. Another probable cause of misclassifications in the past is the partially low overall fit of phenotypic characters to the sequence-based tree. Even though a significant degree of phylogenetic conservation was detected in all characters investigated, the overall fit to the tree varied considerably.

Keywords: G+C content; Genome BLAST Distance Phylogeny; chemotaxonomy; genome size; morphology; phylogenetic systematics; phylogenomics.