A clustering method for small scRNA-seq data based on subspace and weighted distance

Zilan Ning; Zhijun Dai; Hongyan Zhang; Yuan Chen; Zheming Yuan

doi:10.7717/peerj.14706

A clustering method for small scRNA-seq data based on subspace and weighted distance

PeerJ. 2023 Jan 23:11:e14706. doi: 10.7717/peerj.14706. eCollection 2023.

Authors

Zilan Ning^{1

2}, Zhijun Dai¹, Hongyan Zhang², Yuan Chen¹, Zheming Yuan¹

Affiliations

¹ Hunan Engineering & Technology Research Centre for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha, Hunan, China.
² Hunan Agricultural University, College of Information and Intelligence, Changsha, Hunan, China.

Abstract

Background: Identifying the cell types using unsupervised methods is essential for scRNA-seq research. However, conventional similarity measures introduce challenges to single-cell data clustering because of the high dimensional, high noise, and high dropout.

Methods: We proposed a clustering method for small ScRNA-seq data based on Subspace and Weighted Distance (SSWD), which follows the assumption that the sets of gene subspace composed of similar density-distributing genes can better distinguish cell groups. To accurately capture the intrinsic relationship among cells or genes, a new distance metric that combines Euclidean and Pearson distance through a weighting strategy was proposed. The relative Calinski-Harabasz (CH) index was used to estimate the cluster numbers instead of the CH index because it is comparable across degrees of freedom.

Results: We compared SSWD with seven prevailing methods on eight publicly scRNA-seq datasets. The experimental results show that the SSWD has better clustering accuracy and the partitioning ability of cell groups. SSWD can be downloaded at https://github.com/ningzilan/SSWD.

Keywords: Consensus clustering; EP_dis; Marker gene; Subspace; scRNA-seq.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cluster Analysis
Gene Expression Profiling* / methods
Sequence Analysis, RNA / methods
Single-Cell Analysis / methods
Single-Cell Gene Expression Analysis*

Grants and funding

This work was supported by the Natural Science Foundation of Hunan Province (2021JJ30351), the Scientific Research Project of Hunan Provincial Department of Education (21B0187), and the Special Funds for Construction of Innovative Provinces in Hunan Province (2021NK1011). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.