Geographically weighted linear combination test for gene-set analysis of a continuous spatial phenotype as applied to intratumor heterogeneity

Front Cell Dev Biol. 2023 Mar 9:11:1065586. doi: 10.3389/fcell.2023.1065586. eCollection 2023.

Abstract

Background: The impact of gene-sets on a spatial phenotype is not necessarily uniform across different locations of cancer tissue. This study introduces a computational platform, GWLCT, for combining gene set analysis with spatial data modeling to provide a new statistical test for location-specific association of phenotypes and molecular pathways in spatial single-cell RNA-seq data collected from an input tumor sample. Methods: The main advantage of GWLCT consists of an analysis beyond global significance, allowing the association between the gene-set and the phenotype to vary across the tumor space. At each location, the most significant linear combination is found using a geographically weighted shrunken covariance matrix and kernel function. Whether a fixed or adaptive bandwidth is determined based on a cross-validation cross procedure. Our proposed method is compared to the global version of linear combination test (LCT), bulk and random-forest based gene-set enrichment analyses using data created by the Visium Spatial Gene Expression technique on an invasive breast cancer tissue sample, as well as 144 different simulation scenarios. Results: In an illustrative example, the new geographically weighted linear combination test, GWLCT, identifies the cancer hallmark gene-sets that are significantly associated at each location with the five spatially continuous phenotypic contexts in the tumors defined by different well-known markers of cancer-associated fibroblasts. Scan statistics revealed clustering in the number of significant gene-sets. A spatial heatmap of combined significance over all selected gene-sets is also produced. Extensive simulation studies demonstrate that our proposed approach outperforms other methods in the considered scenarios, especially when the spatial association increases. Conclusion: Our proposed approach considers the spatial covariance of gene expression to detect the most significant gene-sets affecting a continuous phenotype. It reveals spatially detailed information in tissue space and can thus play a key role in understanding the contextual heterogeneity of cancer cells.

Keywords: Spatial single cell analysis; cancer-associated fibroblast; gene-set analysis; geographically weighted regression; intratumor heterogeneity; linear combination test.

Grants and funding

This research was supported by a Fellowship from Mathematics of Information Technology and Complex Systems Accelerate program (grant number: RES0047324, recipient: ID).