Toward a more robust assessment of intraspecies diversity, using fewer genetic markers

Appl Environ Microbiol. 2006 Nov;72(11):7286-93. doi: 10.1128/AEM.01398-06. Epub 2006 Sep 15.

Abstract

Phylogenetic sequence analysis of single or multiple genes has dominated the study and census of the genetic diversity among closely related bacteria. It remains unclear, however, how the results based on a few genes in the genome correlate with whole-genome-based relatedness and what genes (if any) best reflect whole-genome-level relatedness and hence should be preferentially used to economize on cost and to improve accuracy. We show here that phylogenies of closely related organisms based on the average nucleotide identity (ANI) of their shared genes correspond accurately to phylogenies based on state-of-the-art analysis of their whole-genome sequences. We use ANI to evaluate the phylogenetic robustness of every gene in the genome and show that almost all core genes, regardless of their functions and positions in the genome, offer robust phylogenetic reconstruction among strains that show 80 to 95% ANI (16S rRNA identity, >98.5%). Lack of elapsed time and, to a lesser extent, horizontal transfer and recombination make the selection of genes more critical for applications that target the intraspecies level, i.e., strains that show >95% ANI according to current standards. A much more accurate phylogeny for the Escherichia coli group was obtained based on just three best-performing genes according to our analysis compared to the concatenated alignment of eight genes that are commonly employed for phylogenetic purposes in this group. Our results are reproducible within the Salmonella, Burkholderia, and Shewanella groups and therefore are expected to have general applicability for microevolution studies, including metagenomic surveys.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bacterial Proteins / genetics*
  • Burkholderia / classification
  • Burkholderia / genetics
  • Escherichia coli / classification
  • Escherichia coli / genetics
  • Genetic Markers*
  • Genetic Variation*
  • Genome, Bacterial*
  • Phylogeny*
  • Proteobacteria / classification*
  • Proteobacteria / genetics
  • Reproducibility of Results
  • Salmonella / classification
  • Salmonella / genetics
  • Sequence Analysis, DNA
  • Shewanella / classification
  • Shewanella / genetics
  • Species Specificity

Substances

  • Bacterial Proteins
  • Genetic Markers