Metagenomic abundance estimation and diagnostic testing on species level

Nucleic Acids Res. 2013 Jan 7;41(1):e10. doi: 10.1093/nar/gks803. Epub 2012 Aug 31.

Abstract

One goal of sequencing-based metagenomic community analysis is the quantitative taxonomic assessment of microbial community compositions. In particular, relative quantification of taxons is of high relevance for metagenomic diagnostics or microbial community comparison. However, the majority of existing approaches quantify at low resolution (e.g. at phylum level), rely on the existence of special genes (e.g. 16S), or have severe problems discerning species with highly similar genome sequences. Yet, problems as metagenomic diagnostics require accurate quantification on species level. We developed Genome Abundance Similarity Correction (GASiC), a method to estimate true genome abundances via read alignment by considering reference genome similarities in a non-negative LASSO approach. We demonstrate GASiC's superior performance over existing methods on simulated benchmark data as well as on real data. In addition, we present applications to datasets of both bacterial DNA and viral RNA source. We further discuss our approach as an alternative to PCR-based DNA quantification.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Classification / methods
  • DNA, Bacterial / analysis
  • DNA, Bacterial / chemistry
  • Escherichia coli / genetics
  • Metagenomics / methods*
  • RNA, Viral / analysis
  • RNA, Viral / chemistry
  • Sequence Alignment

Substances

  • DNA, Bacterial
  • RNA, Viral