Stratified reconstruction of ancestral Escherichia coli diversification

BMC Genomics. 2019 Dec 5;20(1):936. doi: 10.1186/s12864-019-6346-1.

Abstract

Background: Phylogenetic analyses of the bacterial genomes based on the simple classification in core- genes and accessory genes pools could offer an incomplete view of the evolutionary processes, of which some are still unresolved. A combined strategy based on stratified phylogeny and ancient molecular polymorphisms is proposed to infer detailed evolutionary reconstructions by using a large number of whole genomes. This strategy, based on the highest number of genomes available in public databases, was evaluated for improving knowledge of the ancient diversification of E. coli. This staggered evolutionary scenario was also used to investigate whether the diversification of the ancient E. coli lineages could be associated with particular lifestyles and adaptive strategies.

Results: Phylogenetic reconstructions, exploiting 6220 available genomes in Genbank, established the E. coli core genome in 1023 genes, representing about 20% of the complete genome. The combined strategy using stratified phylogeny plus molecular polymorphisms inferred three ancient lineages (D, EB1A and FGB2). Lineage D was the closest to E. coli root. A staggered diversification could also be proposed in EB1A and FGB2 lineages and the phylogroups into these lineages. Several molecular markers suggest that each lineage had different adaptive trajectories. The analysis of gained and lost genes in the main lineages showed that functions of carbohydrates utilization (uptake of and metabolism) were gained principally in EB1A lineage, whereas loss of environmental-adaptive functions in FGB2 lineage were observed, but this lineage showed higher accumulated mutations and ancient recombination events. The population structure of E. coli was re-evaluated including up to 7561 new sequenced genomes, showing a more complex population structure of E. coli, as a new phylogroup, phylogroup I, was proposed.

Conclusions: A staggered reconstruction of E. coli phylogeny is proposed, indicating evolution from three ancestral lineages to reach all main known phylogroups. New phylogroups were confirmed, suggesting an increasingly complex population structure of E. coli. However these new phylogroups represent < 1% of the global E. coli population. A few key evolutionary forces have driven the diversification of the two main E. coli lineages, metabolic flexibility in one of them and colonization-virulence in the other.

Keywords: Ancient reconstruction; Escherichia coli; Evolution; Molecular polymorphism hallmark; Phylogeny; Phylogroups; Stratified phylogeny.

MeSH terms

  • Databases, Genetic
  • Escherichia coli / classification*
  • Escherichia coli / genetics
  • Escherichia coli / pathogenicity
  • Escherichia coli Proteins / genetics*
  • Evolution, Molecular
  • Genome, Bacterial
  • Genomics / methods*
  • Phylogeny
  • Virulence Factors / genetics

Substances

  • Escherichia coli Proteins
  • Virulence Factors