A Hypothesis Testing Based Method for Normalization and Differential Expression Analysis of RNA-Seq Data

PLoS One. 2017 Jan 10;12(1):e0169594. doi: 10.1371/journal.pone.0169594. eCollection 2017.

Abstract

Next-generation sequencing technologies have made RNA sequencing (RNA-seq) a popular choice for measuring gene expression level. To reduce the noise of gene expression measures and compare them between several conditions or samples, normalization is an essential step to adjust for varying sample sequencing depths and other unwanted technical effects. In this paper, we develop a novel global scaling normalization method by employing the available knowledge of housekeeping genes. We formulate the problem from the hypothesis testing perspective and find an optimal scaling factor that minimizes the deviation between the empirical and the nominal type I error. Applying our approach to various simulation studies and real examples, we demonstrate that it is more accurate and robust than the state-of-the-art alternatives in detecting differentially expression genes.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Embryonic Stem Cells
  • Gene Expression Profiling* / methods
  • Gene Expression Regulation*
  • Genes, Essential
  • High-Throughput Nucleotide Sequencing*
  • Kidney / metabolism
  • Liver / metabolism

Grants and funding

This work was supported by the Tianyuan Fund for Mathematics (No. 11526143), the Doctor Start Fund of Guangdong Province (No. 2016A030310062 (85118-000043)) and the Natural Science Foundation of SZU (No. 836-00008303) to Yan Zhou. This work was also supported by the National Science Foundation of China (Nos. 11501248 and 11601094) to Guochang Wang and by the Tianyuan Fund for Mathematics (No. 11626160) to Han Li. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.