Adapted tensor decomposition and PCA based unsupervised feature extraction select more biologically reasonable differentially expressed genes than conventional methods

Sci Rep. 2022 Oct 19;12(1):17438. doi: 10.1038/s41598-022-21474-z.

Abstract

Tensor decomposition- and principal component analysis-based unsupervised feature extraction were proposed almost 5 and 10 years ago, respectively; although these methods have been successfully applied to a wide range of genome analyses, including drug repositioning, biomarker identification, and disease-causing genes' identification, some fundamental problems have been identified: the number of genes identified was too small to assume that there were no false negatives, and the histogram of P values derived was not fully coincident with the null hypothesis that principal component and singular value vectors follow the Gaussian distribution. Optimizing the standard deviation such that the histogram of P values is as much as possible coincident with the null hypothesis results in an increase in the number and biological reliability of the selected genes. Our contribution was that we improved these methods so as to be able to select biologically more reasonable differentially expressed genes than the state of art methods that must empirically assume negative binomial distributions and dispersion relation, which is required for the selecting more expressed genes than less expressed ones, which can be achieved by the proposed methods that do not have to assume these.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Biomarkers
  • Normal Distribution
  • Principal Component Analysis
  • Reproducibility of Results

Substances

  • Biomarkers