SeqAcademy: an educational pipeline for RNA-Seq and ChIP-Seq analysis

F1000Res. 2018 May 22:7:ISCB Comm J-628. doi: 10.12688/f1000research.14880.4. eCollection 2018.

Abstract

Quantification of gene expression and characterization of gene transcript structures are central problems in molecular biology. RNA sequencing (RNA-Seq) and chromatin immunoprecipitation sequencing (ChIP-Seq) are important methods, but can be cumbersome and difficult for beginners to learn. To teach interested students and scientists how to analyze RNA-Seq and ChIP-Seq data, we present a start-to-finish tutorial for analyzing RNA-Seq and ChIP-Seq data: SeqAcademy ( source code: https://github.com/NCBI-Hackathons/seqacademy, webpage: http://www.seqacademy.org/). This user-friendly pipeline, fully written in markdown language, emphasizes the use of publicly available RNA-Seq and ChIP-Seq data and strings together popular tools that bridge that gap between raw sequencing reads and biological insight. We demonstrate practical and conceptual considerations for various RNA-Seq and ChIP-Seq analysis steps with a biological use case - a previously published yeast experiment. This work complements existing sophisticated RNA-Seq and ChIP-Seq pipelines designed for advanced users by gently introducing the critical components of RNA-Seq and ChIP-Seq analysis to the novice bioinformatician. In conclusion, this well-documented pipeline will introduce state-of-the-art RNA-Seq and ChIP-Seq analysis tools to beginning bioinformaticians and help facilitate the analysis of the burgeoning amounts of public RNA-Seq and ChIP-Seq data.

Keywords: ChIP-Seq; RNA-Seq; alignment; differential gene expression; education; peak-calling; pipeline; tutorial.

Grants and funding

This research was supported by the Intramural Research Program of the NIH, National Library of Medicine as well as the Intramural Research Program of the National Institute on Aging.