A bioinformatics pipeline for sequence-based analyses of fungal biodiversity

Methods Mol Biol. 2011:722:141-55. doi: 10.1007/978-1-61779-040-9_10.

Abstract

The internal transcribed spacer (ITS) is the locus of choice with which to characterize fungal diversity in environmental samples. However, methods to analyze ITS datasets have lagged behind the capacity to generate large amounts of sequence information. Here, we describe our bioinformatics pipeline to process large fungal ITS sequence datasets, from raw chromatograms to a spreadsheet of operational taxonomic unit (OTU) abundances across samples. Steps include assembling of reads originating from one clone, identifying primer "barcodes" or "tags," trimming vectors and primers, marking low-quality base calls and removing low-quality sequences, orienting sequences, extracting the ITS region from longer amplicons, and grouping sequences into OTUs. We expect that the principles and tools presented here are relevant to datasets arising from ever-evolving new technologies.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Biodiversity*
  • Computational Biology / methods*
  • DNA, Fungal / analysis
  • DNA, Fungal / genetics
  • DNA, Ribosomal Spacer / analysis
  • Fungi / classification*
  • Fungi / genetics
  • High-Throughput Nucleotide Sequencing / methods*
  • RNA, Ribosomal, 5.8S / genetics
  • Sequence Analysis, DNA / methods*

Substances

  • DNA, Fungal
  • DNA, Ribosomal Spacer
  • RNA, Ribosomal, 5.8S