Thresher: an improved algorithm for peak height thresholding of microbial community profiles

Bioinformatics. 2014 Nov 15;30(22):3257-63. doi: 10.1093/bioinformatics/btu528. Epub 2014 Aug 5.

Abstract

Motivation: This article presents Thresher, an improved technique for finding peak height thresholds for automated rRNA intergenic spacer analysis (ARISA) profiles. We argue that thresholds must be sample dependent, taking community richness into account. In most previous fragment analyses, a common threshold is applied to all samples simultaneously, ignoring richness variations among samples and thereby compromising cross-sample comparison. Our technique solves this problem, and at the same time provides a robust method for outlier rejection, selecting for removal any replicate pairs that are not valid replicates.

Results: Thresholds are calculated individually for each replicate in a pair, and separately for each sample. The thresholds are selected to be the ones that minimize the dissimilarity between the replicates after thresholding. If a choice of threshold results in the two replicates in a pair failing a quantitative test of similarity, either that threshold or that sample must be rejected. We compare thresholded ARISA results with sequencing results, and demonstrate that the Thresher algorithm outperforms conventional thresholding techniques.

Availability and implementation: The software is implemented in R, and the code is available at http://verenastarke.wordpress.com or by contacting the author.

Contact: vstarke@ciw.edu or http://verenastarke.wordpress.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • DNA, Ribosomal Spacer*
  • Environmental Microbiology*
  • Humans
  • Sequence Analysis, DNA
  • Software

Substances

  • DNA, Ribosomal Spacer