Quantifying the clusterness and trajectoriness of single-cell RNA-seq data

PLoS Comput Biol. 2024 Feb 28;20(2):e1011866. doi: 10.1371/journal.pcbi.1011866. eCollection 2024 Feb.

Abstract

Among existing computational algorithms for single-cell RNA-seq analysis, clustering and trajectory inference are two major types of analysis that are routinely applied. For a given dataset, clustering and trajectory inference can generate vastly different visualizations that lead to very different interpretations of the data. To address this issue, we propose multiple scores to quantify the "clusterness" and "trajectoriness" of single-cell RNA-seq data, in other words, whether the data looks like a collection of distinct clusters or a continuum of progression trajectory. The scores we introduce are based on pairwise distance distribution, persistent homology, vector magnitude, Ripley's K, and degrees of connectivity. Using simulated datasets, we demonstrate that the proposed scores are able to effectively differentiate between cluster-like data and trajectory-like data. Using real single-cell RNA-seq datasets, we demonstrate the scores can serve as indicators of whether clustering analysis or trajectory inference is a more appropriate choice for biological interpretation of the data.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Gene Expression Profiling
  • Sequence Analysis, RNA
  • Single-Cell Analysis*
  • Single-Cell Gene Expression Analysis*

Grants and funding

This publication is part of the Gut Cell Atlas Crohn’s Disease Consortium funded by The Leona M. and Harry B. Helmsley Charitable Trust and is supported by a grant from Helmsley to Georgia Institute of Technology (www.helmsleytrust.org/gut-cell-atlas/). This work was also supported by the National Science Foundation (CCF2007029). The funders had no role in study design, data collection and analysis, the decision to publish, or the preparation of the manuscript.