scHOIS: Determining Cell Heterogeneity Through Hierarchical Clustering Based on Optimal Imputation Strategy

IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1431-1444. doi: 10.1109/TCBB.2022.3203592.

Abstract

Advances in single-cell RNA sequencing (scRNA-seq) technology provide an unbiased and high-throughput analysis of each cell at single-cell resolution, and further facilitate the development of cellular heterogeneity analysis. Despite the promise of scRNA-seq, the data generated by this method are sparse and noisy because of the presence of dropout events, which can greatly impact downstream analyses such as differential gene expression, cell type annotation, and linage trajectory reconstruction. The development of effective and robust computational methods to address both dropout and clustering are thus urgently needed. In this study, we propose a flexible, accurate two-stage algorithm for single cell heterogeneity analysis via hierarchical clustering based on an optimal imputation strategy, called scHOIS. At the first stage, masked non-negative matrix factorization is applied to approximate the original observed scRNA-seq data, with optimal rank determined by variance analysis. At the second stage, hierarchical clustering is applied to group the imputed cells using Pearson correlation to measure similarity, with the optimal number of clusters determined by integrating three classical indexes. We performed extensive experiments on real-world datasets, which showed that scHOIS effectively and robustly distinguished cellular differences and that the clustering performance of this algorithm was superior to that of other state-of-the-art methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Gene Expression Profiling*
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods