A high-efficiency differential expression method for cancer heterogeneity using large-scale single-cell RNA-sequencing data

Front Genet. 2022 Nov 29:13:1063130. doi: 10.3389/fgene.2022.1063130. eCollection 2022.

Abstract

Colorectal cancer is a highly heterogeneous disease. Tumor heterogeneity limits the efficacy of cancer treatment. Single-cell RNA-sequencing technology (scRNA-seq) is a powerful tool for studying cancer heterogeneity at cellular resolution. The sparsity, heterogeneous diversity, and fast-growing scale of scRNA-seq data pose challenges to the flexibility, accuracy, and computing efficiency of the differential expression (DE) methods. We proposed HEART (high-efficiency and robust test), a statistical combination test that can detect DE genes with various sources of differences beyond mean expression changes. To validate the performance of HEART, we compared HEART and the other six popular DE methods on various simulation datasets with different settings by two simulation data generation mechanisms. HEART had high accuracy ( F 1 score >0.75) and brilliant computational efficiency (less than 2 min) on multiple simulation datasets in various experimental settings. HEART performed well on DE genes detection for the PBMC68K dataset quantified by UMI counts and the human brain single-cell dataset quantified by read counts ( F 1 score = 0.79, 0.65). By applying HEART to the single-cell dataset of a colorectal cancer patient, we found several potential blood-based biomarkers (CTTN, S100A4, S100A6, UBA52, FAU, and VIM) associated with colorectal cancer metastasis and validated them on additional spatial transcriptomic data of other colorectal cancer patients.

Keywords: DE gene; PBMC68K; colorectal cancer; combination test; differential analysis.