AUSPP: A universal short-read pre-processing package

J Bioinform Comput Biol. 2019 Dec;17(6):1950037. doi: 10.1142/S0219720019500379.

Abstract

There are many short-read aligners that can map short reads to a reference genome/sequence, and most of them can directly accept a FASTQ file as the input query file. However, the raw data usually need to be pre-processed. Few software programs specialize in pre-processing raw data generated by a variety of next-generation sequencing (NGS) technologies. Here, we present AUSPP, a Perl script-based pipeline for pre-processing and automatic mapping of NGS short reads. This pipeline encompasses quality control, adaptor trimming, collapsing of reads, structural RNA removal, length selection, read mapping, and normalized wiggle file creation. It facilitates the processing from raw data to genome mapping and is therefore a powerful tool for the steps before meta-analysis. Most importantly, since AUSPP has default processing pipeline settings for many types of NGS data, most of the time, users will simply need to provide the raw data and genome. AUSPP is portable and easy to install, and the source codes are freely available at https://github.com/highlei/AUSPP.

Keywords: Next-generation sequencing (NGS); Perl script; pipeline; pre-processing; short reads.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromosome Mapping
  • High-Throughput Nucleotide Sequencing / methods*
  • Polymorphism, Single Nucleotide
  • Programming Languages
  • Quality Control
  • Sequence Alignment / methods
  • Sequence Analysis, RNA / methods
  • Software*
  • User-Computer Interface
  • Workflow