A cell abundance analysis based on efficient PAM clustering for a better understanding of the dynamics of endometrial remodelling

BMC Bioinformatics. 2023 Nov 22;24(1):440. doi: 10.1186/s12859-023-05569-6.

Abstract

Background: Single-cell RNA sequencing (scRNA-seq) is a powerful tool for investigating cell abundance changes during tissue regeneration and remodeling processes. Differential cell abundance supports the initial clustering of all cells; then, the number of cells per cluster and sample are evaluated, and the dependence of these counts concerning the phenotypic covariates of the samples is studied. Analysis heavily depends on the clustering method. Partitioning Around Medoids (PAM or k-medoids) represents a well-established clustering procedure that leverages the downstream interpretation of clusters by pinpointing real individuals in the dataset as cluster centers (medoids) without reducing dimensions. Of note, PAM suffers from high computational costs and memory requirements.

Results: This paper proposes a method for differential abundance analysis using PAM as a clustering method and negative binomial regression as a statistical model to relate covariates to cluster/cell counts. We used this approach to study the differential cell abundance of human endometrial cell types throughout the natural secretory phase of the menstrual cycle. We developed a new R package -scellpam-, that incorporates an efficient parallel C++ implementation of PAM, and applied this package in this study. We compared the PAM-BS clustering method with other methods and evaluated both the computational aspects of its implementation and the quality of the classifications obtained using distinct published datasets with known subpopulations that demonstrate promising results.

Conclusions: The implementation of PAM-BS, included in the scellpam package, exhibits robust performance in terms of speed and memory usage compared to other related methods. PAM allowed quick and robust clustering of sets of cells with a size ranging from 70,000 to 300,000 cells. https://cran.r-project.org/web/packages/scellpam/index.html . Finally, our approach provides important new insights into the transient subpopulations associated with the fertile time frame when applied to the study of changes in the human endometrium during the secretory phase of the menstrual cycle.

Keywords: Differential abundance testing; Human endometrial cell types; Partitioning around medoids; Single-cell RNA-seq.

MeSH terms

  • Cluster Analysis
  • Endometrium*
  • Female
  • Gene Expression Profiling / methods
  • Humans
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods