Optimization of alignment-based methods for taxonomic binning of metagenomics reads

Bioinformatics. 2016 Jun 15;32(12):1779-87. doi: 10.1093/bioinformatics/btw040. Epub 2016 Feb 1.

Abstract

Motivation: Alignment-based taxonomic binning for metagenome characterization proceeds in two steps: reads mapping against a reference database (RDB) and taxonomic assignment according to the best hits. Beyond the sequencing technology and the completeness of the RDB, selecting the optimal configuration of the workflow, in particular the mapper parameters and the best hit selection threshold, to get the highest binning performance remains quite empirical.

Results: We developed a statistical framework to perform such optimization at a minimal computational cost. Using an optimization experimental design and simulated datasets for three sequencing technologies, we built accurate prediction models for five performance indicators and then derived the parameter configuration providing the optimal performance. Whatever the mapper and the dataset, we observed that the optimal configuration yielded better performance than the default configuration and that the best hit selection threshold had a large impact on performance. Finally, on a reference dataset from the Human Microbiome Project, we confirmed that the optimized configuration increased the performance compared with the default configuration.

Availability and implementation: Not applicable.

Contact: magali.dancette@biomerieux.com

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Humans
  • Metagenome
  • Metagenomics*
  • Microbiota
  • Models, Theoretical