A robust removing unwanted variation-testing procedure via γ -divergence

Biometrics. 2019 Jun;75(2):650-662. doi: 10.1111/biom.13002. Epub 2019 Aug 20.

Abstract

Identification of differentially expressed genes (DE genes) is commonly conducted in modern biomedical research. However, unwanted variation inevitably arises during the data collection process, which can make the detection results heavily biased. Various methods have been suggested for removing the unwanted variation while keeping the biological variation to ensure a reliable analysis result. Removing unwanted variation (RUV) has recently been proposed for this purpose, which works by virtue of negative control genes. On the other hand, outliers frequently appear in modern high-throughput genetic data, which can heavily affect the performances of RUV and its downstream analysis. In this work, we propose a robust RUV-testing procedure (a robust RUV procedure to remove unwanted variance, followed by a robust testing procedure to identify DE genes) via γ -divergence. The advantages of our method are twofold: (a) it does not involve any modeling for the outlier distribution, which makes it applicable to various situations; (b) it is easy to implement in the sense that its robustness is controlled by a single tuning parameter γ of γ -divergence, and a data-driven criterion is developed to select γ . When applied to real data sets, our method can successfully remove unwanted variation, and was able to identify more DE genes than conventional methods.

Keywords: multiple hypothesis testing; negative control genes; removing unwanted variation; robustness; unwanted variation; γ-divergence.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Interpretation, Statistical
  • Gene Expression Profiling / methods*
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Humans
  • Models, Genetic*