FaStaNMF: a Fast and Stable Non-negative Matrix Factorization for Gene Expression

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jul 19:PP. doi: 10.1109/TCBB.2023.3296979. Online ahead of print.

Abstract

Gene expression analysis of samples with mixed cell types only provides limited insight to the characteristics of specific tissues. In silico deconvolution can be applied to extract cell type specific expression, thus avoiding prohibitively expensive techniques such as cell sorting or single-cell sequencing. Non-negative matrix factorization (NMF) is a deconvolution method shown to be useful for gene expression data, in part due to its constraint of non-negativity. Unlike other methods, NMF provides the capability to deconvolve without prior knowledge of the components of the model. However, NMF is not guaranteed to provide a globally unique solution. In this work, we present FaStaNMF, a method that balances achieving global stability of the NMF results, which is essential for inter-experiment and inter-lab reproducibility, with accuracy and speed. Results: FaStaNMF was applied to four datasets with known ground truth, created based on publicly available data or by using our simulation infrastructure, RNAGinesis. We assessed FaStaNMF on three criteria - speed, accuracy, and stability, and it favorably compared to the standard approach of achieving reproduceable results with NMF. We expect that FaStaNMF can be applied successfully to a wide array of biological data, such as different tumor/immune and other disease microenvironments.