UNMF: a unified nonnegative matrix factorization for multi-dimensional omics data

Brief Bioinform. 2023 Sep 20;24(5):bbad253. doi: 10.1093/bib/bbad253.

Abstract

Factor analysis, ranging from principal component analysis to nonnegative matrix factorization, represents a foremost approach in analyzing multi-dimensional data to extract valuable patterns, and is increasingly being applied in the context of multi-dimensional omics datasets represented in tensor form. However, traditional analytical methods are heavily dependent on the format and structure of the data itself, and if these change even slightly, the analyst must change their data analysis strategy and techniques and spend a considerable amount of time on data preprocessing. Additionally, many traditional methods cannot be applied as-is in the presence of missing values in the data. We present a new statistical framework, unified nonnegative matrix factorization (UNMF), for finding informative patterns in messy biological data sets. UNMF is designed for tidy data format and structure, making data analysis easier and simplifying the development of data analysis tools. UNMF can handle a wide range of data structures and formats, and works seamlessly with tensor data including missing observations and repeated measurements. The usefulness of UNMF is demonstrated through its application to several multi-dimensional omics data, offering user-friendly and unified features for analysis and integration. Its application holds great potential for the life science community. UNMF is implemented with R and is available from GitHub (https://github.com/abikoushi/moltenNMF).

Keywords: Bayesian statistics; factor analysis; gene expression analysis; metagenome; multi-dimensional data; nonnegative matrix factorization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Factor Analysis, Statistical
  • Multiomics*
  • Principal Component Analysis