ASAP 2: a pipeline and web server to analyze marker gene amplicon sequencing data automatically and consistently

BMC Bioinformatics. 2022 Jan 6;23(1):27. doi: 10.1186/s12859-021-04555-0.

Abstract

Background: Amplicon sequencing of marker genes such as 16S rDNA have been widely used to survey and characterize microbial community. However, the complex data analyses have required many interfering manual steps often leading to inconsistencies in results.

Results: Here, we have developed a pipeline, amplicon sequence analysis pipeline 2 (ASAP 2), to automate and glide through the processes without the usual manual inspections and user's interference, for instance, in the detection of barcode orientation, selection of high-quality region of reads, and determination of resampling depth and many more. The pipeline integrates all the analytical processes such as importing data, demultiplexing, summarizing read profiles, trimming quality, denoising, removing chimeric sequences and making the feature table among others. The pipeline accepts multiple file formats as input including multiplexed or demultiplexed, paired-end or single-end, barcode inside or outside and raw or intermediate data (e.g. feature table). The outputs include taxonomic classification, alpha/beta diversity, community composition, ordination analysis and statistical tests. ASAP 2 supports merging multiple sequencing runs which helps integrate and compare data from different sources (public databases and collaborators).

Conclusions: Our pipeline minimizes hands-on interference and runs amplicon sequence variant (ASV)-based amplicon sequencing analysis automatically and consistently. Our web server assists researchers that have no access to high performance computer (HPC) or have limited bioinformatics skills. The pipeline and web server can be accessed at https://github.com/tianrenmaogithub/asap2 and https://hts.iit.edu/asap2 , respectively.

Keywords: 16S rRNA; Amplicon sequencing; Marker gene; Pipeline; Web server.

MeSH terms

  • Computational Biology
  • Computers
  • High-Throughput Nucleotide Sequencing*
  • RNA, Ribosomal, 16S
  • Sequence Analysis, DNA
  • Software*

Substances

  • RNA, Ribosomal, 16S