[Automatic analysis pipeline of next-generation sequencing data]

Yi Chuan. 2014 Jun;36(6):618-24. doi: 10.3724/SP.J.1005.2014.0618.
[Article in Chinese]

Abstract

The development of next-generation sequencing has generated high demand for data processing and analysis. Although there are a lot of software for analyzing next-generation sequencing data, most of them are designed for one specific function (e.g., alignment, variant calling or annotation). Therefore, it is necessary to combine them together for data analysis and to generate interpretable results for biologists. This study designed a pipeline to process Illumina sequencing data based on Perl programming language and SGE system. The pipeline takes original sequence data (fastq format) as input, calls the standard data processing software (e.g., BWA, Samtools, GATK, and Annovar), and finally outputs a list of annotated variants that researchers can further analyze. The pipeline simplifies the manual operation and improves the efficiency by automatization and parallel computation. Users can easily run the pipeline by editing the configuration file or clicking the graphical interface. Our work will facilitate the research projects using the sequencing technology.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Automation / instrumentation
  • Automation / methods*
  • Computational Biology / instrumentation*
  • Data Mining / methods*
  • Database Management Systems* / instrumentation
  • Databases, Genetic
  • High-Throughput Nucleotide Sequencing