edgeR: a versatile tool for the analysis of shRNA-seq and CRISPR-Cas9 genetic screens

F1000Res. 2014 Apr 24:3:95. doi: 10.12688/f1000research.3928.2. eCollection 2014.

Abstract

Pooled library sequencing screens that perturb gene function in a high-throughput manner are becoming increasingly popular in functional genomics research. Irrespective of the mechanism by which loss of function is achieved, via either RNA interference using short hairpin RNAs (shRNAs) or genetic mutation using single guide RNAs (sgRNAs) with the CRISPR-Cas9 system, there is a need to establish optimal analysis tools to handle such data. Our open-source processing pipeline in edgeR provides a complete analysis solution for screen data, that begins with the raw sequence reads and ends with a ranked list of candidate genes for downstream biological validation. We first summarize the raw data contained in a fastq file into a matrix of counts (samples in the columns, genes in the rows) with options for allowing mismatches and small shifts in sequence position. Diagnostic plots, normalization and differential representation analysis can then be performed using established methods to prioritize results in a statistically rigorous way, with the choice of either the classic exact testing methodology or generalized linear modeling that can handle complex experimental designs. A detailed users' guide that demonstrates how to analyze screen data in edgeR along with a point-and-click implementation of this workflow in Galaxy are also provided. The edgeR package is freely available from http://www.bioconductor.org.

Grants and funding

This research was supported by NHMRC Project grants 1050661 (MER) and 1059622 (MER and MEB), Victorian State Government Operational Infrastructure Support and Australian Government NHMRC IRIISS.