INSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis

PLoS Genet. 2024 Mar 14;20(3):e1011189. doi: 10.1371/journal.pgen.1011189. eCollection 2024 Mar.

Abstract

RNA sequencing (RNA-Seq) is widely used to capture transcriptome dynamics across tissues, biological entities, and conditions. Currently, few or no methods can handle multiple biological variables (e.g., tissues/ phenotypes) and their interactions simultaneously, while also achieving dimension reduction (DR). We propose INSIDER, a general and flexible statistical framework based on matrix factorization, which is freely available at https://github.com/kai0511/insider. INSIDER decomposes variation from different biological variables and their interactions into a shared low-rank latent space. Particularly, it introduces the elastic net penalty to induce sparsity while considering the grouping effects of genes. It can achieve DR of high-dimensional data (of > = 3 dimensions), as opposed to conventional methods (e.g., PCA/NMF) which generally only handle 2D data (e.g., sample × expression). Besides, it enables computing 'adjusted' expression profiles for specific biological variables while controlling variation from other variables. INSIDER is computationally efficient and accommodates missing data. INSIDER also performed similarly or outperformed a close competing method, SDA, as shown in simulations and can handle complex missing data in RNA-Seq data. Moreover, unlike SDA, it can be used when the data cannot be structured into a tensor. Lastly, we demonstrate its usefulness via real data analysis, including clustering donors for disease subtyping, revealing neuro-development trajectory using the BrainSpan data, and uncovering biological processes contributing to variables of interest (e.g., disease status and tissue) and their interactions.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Data Analysis
  • Gene Expression Profiling / methods
  • RNA / genetics
  • Sequence Analysis, RNA
  • Single-Cell Analysis / methods
  • Transcriptome* / genetics

Substances

  • RNA

Grants and funding

ZL has been supported by the Chinese University of Hong Kong startup grant (4930181), the Chinese University of Hong Kong Science Faculty’s Collaborative Research Impact Matching Scheme (CRIMS 4620033), the Chinese University of Hong Kong direct grants (4053540, 4053586), and Hong Kong Research Grant Council (GRF 14301120, 14300923). The funders play no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.