Flexible Non-Negative Matrix Factorization to Unravel Disease-Related Genes

IEEE/ACM Trans Comput Biol Bioinform. 2019 Nov-Dec;16(6):1948-1957. doi: 10.1109/TCBB.2018.2823746. Epub 2018 Apr 6.

Abstract

Recently, non-negative matrix factorization (NMF) has been shown to perform well in the analysis of omics data. NMF assumes that the expression level of one gene is a linear additive composition of metagenes. The elements in metagene matrix represent the regulation effects and are restricted to non-negativity. However, according to the real biological meaning, there are two kinds of regulation effects, i.e., up-regulation and down-regulation. Few methods based on NMF have considered this biological meaning. Therefore, we designed a flexible non-negative matrix factorization (FNMF) algorithm by further considering the biological meaning of gene expression data. It allows negative numbers in the metagene matrix, and negative numbers represent down-regulation effects. We separated gene expression data into disease-driven gene expression and background gene expression. Subsequently, we computed disease-driven gene relative expression, and a ranked list of genes was obtained. The top ranked genes are considered to be involved in some disease-related biological processes. Experimental results on two real-world gene expression data demonstrate the feasibility and effectiveness of FNMF. Compared with conventional disease-related gene identification algorithms, FNMF has superior performance in analyzing gene expression data of diseases with complex pathology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Area Under Curve
  • Computational Biology / methods*
  • Diabetes Mellitus, Type 2 / genetics*
  • Diabetes Mellitus, Type 2 / metabolism
  • Disease Progression
  • Gene Expression Profiling
  • Gene Expression Regulation*
  • Genomics
  • Humans
  • Huntington Disease / genetics*
  • Huntington Disease / metabolism
  • Linear Models
  • Mice
  • Phenotype
  • ROC Curve
  • Reproducibility of Results
  • Signal Transduction