Matrix factorization for biomedical link prediction and scRNA-seq data imputation: an empirical survey

Brief Bioinform. 2022 Jan 17;23(1):bbab479. doi: 10.1093/bib/bbab479.

Abstract

Advances in high-throughput experimental technologies promote the accumulation of vast number of biomedical data. Biomedical link prediction and single-cell RNA-sequencing (scRNA-seq) data imputation are two essential tasks in biomedical data analyses, which can facilitate various downstream studies and gain insights into the mechanisms of complex diseases. Both tasks can be transformed into matrix completion problems. For a variety of matrix completion tasks, matrix factorization has shown promising performance. However, the sparseness and high dimensionality of biomedical networks and scRNA-seq data have raised new challenges. To resolve these issues, various matrix factorization methods have emerged recently. In this paper, we present a comprehensive review on such matrix factorization methods and their usage in biomedical link prediction and scRNA-seq data imputation. Moreover, we select representative matrix factorization methods and conduct a systematic empirical comparison on 15 real data sets to evaluate their performance under different scenarios. By summarizing the experimental results, we provide general guidelines for selecting matrix factorization methods for different biomedical matrix completion tasks and point out some future directions to further improve the performance for biomedical link prediction and scRNA-seq data imputation.

Keywords: biomedical network; data imputation; link prediction; matrix factorization; scRNA-seq data.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Data Analysis*
  • Exome Sequencing
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods