Random rotation for identifying differentially expressed genes with linear models following batch effect correction

Bioinformatics. 2021 Aug 9;37(15):2142-2149. doi: 10.1093/bioinformatics/btab063.

Abstract

Motivation: Data generated from high-throughput technologies such as sequencing, microarray and bead-chip technologies are unavoidably affected by batch effects (BEs). Large effort has been put into developing methods for correcting these effects. Often, BE correction and hypothesis testing cannot be done with one single model, but are done successively with separate models in data analysis pipelines. This potentially leads to biased P-values or false discovery rates due to the influence of BE correction on the data.

Results: We present a novel approach for estimating null distributions of test statistics in data analysis pipelines where BE correction is followed by linear model analysis. The approach is based on generating simulated datasets by random rotation and thereby retains the dependence structure of genes adequately. This allows estimating null distributions of dependent test statistics, and thus the calculation of resampling-based P-values and false-discovery rates following BE correction while maintaining the alpha level.

Availability: The described methods are implemented as randRotation package on Bioconductor: https://bioconductor.org/packages/randRotation/.

Contact: p.hettegger@gmail.com.

Supplementary information: Supplementary data are available at Bioinformatics online.