A multivariate statistical test for differential expression analysis

Sci Rep. 2022 May 18;12(1):8265. doi: 10.1038/s41598-022-12246-w.

Abstract

Statistical tests of differential expression usually suffer from two problems. Firstly, their statistical power is often limited when applied to small and skewed data sets. Secondly, gene expression data are usually discretized by applying arbitrary criteria to limit the number of false positives. In this work, a new statistical test obtained from a convolution of multivariate hypergeometric distributions, the Hy-test, is proposed to address these issues. Hy-test has been carried out on transcriptomic data from breast and kidney cancer tissues, and it has been compared with other differential expression analysis methods. Hy-test allows implicit discretization of the expression profiles and is more selective in retrieving both differential expressed genes and terms of Gene Ontology. Hy-test can be adopted together with other tests to retrieve information that would remain hidden otherwise, e.g., terms of (1) cell cycle deregulation for breast cancer and (2) "programmed cell death" for kidney cancer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms* / genetics
  • Female
  • Gene Expression Profiling / methods
  • Gene Ontology
  • Humans
  • Kidney Neoplasms* / genetics
  • Models, Statistical