[Automatic analysis pipeline of next-generation sequencing data]

Li Wenke; Li Fengyu; Zhang Siyao; Cai Bin; Zheng Na; Nie Yu; Zhou Dao; Zhao Qian

doi:10.3724/SP.J.1005.2014.0618

[Automatic analysis pipeline of next-generation sequencing data]

Yi Chuan. 2014 Jun;36(6):618-24. doi: 10.3724/SP.J.1005.2014.0618.

[Article in Chinese]

Authors

Li Wenke¹, Li Fengyu², Zhang Siyao¹, Cai Bin¹, Zheng Na¹, Nie Yu¹, Zhou Dao³, Zhao Qian¹

Affiliations

¹ State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Disease, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China.
² State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Disease, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China; College of Biomedical Engineering, South-Central University for Nationalities, Wuhan 430074, China.
³ College of Biomedical Engineering, South-Central University for Nationalities, Wuhan 430074, China.

PMID: 24929521
DOI: 10.3724/SP.J.1005.2014.0618

Abstract

The development of next-generation sequencing has generated high demand for data processing and analysis. Although there are a lot of software for analyzing next-generation sequencing data, most of them are designed for one specific function (e.g., alignment, variant calling or annotation). Therefore, it is necessary to combine them together for data analysis and to generate interpretable results for biologists. This study designed a pipeline to process Illumina sequencing data based on Perl programming language and SGE system. The pipeline takes original sequence data (fastq format) as input, calls the standard data processing software (e.g., BWA, Samtools, GATK, and Annovar), and finally outputs a list of annotated variants that researchers can further analyze. The pipeline simplifies the manual operation and improves the efficiency by automatization and parallel computation. Users can easily run the pipeline by editing the configuration file or clicking the graphical interface. Our work will facilitate the research projects using the sequencing technology.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Automation / instrumentation
Automation / methods*
Computational Biology / instrumentation*
Data Mining / methods*
Database Management Systems* / instrumentation
Databases, Genetic
High-Throughput Nucleotide Sequencing