Accuracy and efficiency of algorithms for the demarcation of bacterial ecotypes from DNA sequence data

Int J Bioinform Res Appl. 2014;10(4-5):409-25. doi: 10.1504/IJBRA.2014.062992.

Abstract

Identification of closely related, ecologically distinct populations of bacteria would benefit microbiologists working in many fields including systematics, epidemiology and biotechnology. Several laboratories have recently developed algorithms aimed at demarcating such 'ecotypes'. We examine the ability of four of these algorithms to correctly identify ecotypes from sequence data. We tested the algorithms on synthetic sequences, with known history and habitat associations, generated under the stable ecotype model and on data from Bacillus strains isolated from Death Valley where previous work has confirmed the existence of multiple ecotypes. We found that one of the algorithms (ecotype simulation) performs significantly better than the others (AdaptML, GMYC, BAPS) in both instances. Unfortunately, it was also shown to be the least efficient of the four. While ecotype simulation is the most accurate, it is by a large margin the slowest of the algorithms tested. Attempts at improving its efficiency are underway.

Keywords: AdaptML; BAPS; Bacillus strains; DNA sequences; GYMC; bacterial ecotypes; bioinformatics; demarcation algorithms; ecotype simulation; stable ecotype model.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Bacillus / classification*
  • Bacillus / genetics
  • Computational Biology / methods*
  • Ecotype*
  • Genes, Bacterial
  • Models, Statistical
  • Sequence Analysis, DNA / methods*
  • Software
  • Species Specificity