Identification of Specific Cervical Cancer Subtypes and Prognostic Gene Sets in Tumor and Nontumor Tissues Based on GSVA Analysis

J Oncol. 2022 Oct 15:2022:6951885. doi: 10.1155/2022/6951885. eCollection 2022.

Abstract

Background: Cervical cancer is the fourth common cancer among women. Its prognosis needs our more attention. Our purpose was to identity new prognostic gene sets to help other researchers develop more effective treatment for cervical cancer patients and improve the prognosis of patients.

Methods: We used gene set variation analysis (GSVA) to calculate the enrichment scores of gene sets and identified three subtypes of cervical cancer through the Cox regression model, k-means clustering algorithm, and nonnegative matrix factorization method (NMF). Chi-square test was utilized to test whether a certain clinical characteristic is different among divided subtypes. We further screened the prognostic gene sets using differential analysis, univariate Cox regression analysis, and least absolute shrinkage and selection operator (LASSO) regression. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were used to analyze which pathways and function the genes from screened gene sets enriched. Search Tool for the Retrieval of Interacting Genes (STRING) was used to draw the protein-protein interaction network, and Cytoscape was used to visualize the hub genes of protein-protein interaction network.

Results: We identified three novel subtypes of cervical cancer in The Cancer Genome Atlas (TCGA) samples and validated in Gene Expression Omnibus (GEO) samples. There were significant variations between the three subtypes in histological type, T stage, M stage, and N stage. T_GSE36888_UNTREATED_VS_IL2_TREATED_STAT5_AB_KNOCKIN_TCELL_2H_UP and N_HALLMARK_ANGIOGENESIS were screened prognostic gene sets. The prognostic model was as follows: riskScore = T_GSE36888_UNTREATED_VS_IL2_TREATED_STAT5_AB_KNOCKIN_TCELL_2H_UP 2.617 + N_HALLMARK_ANGIOGENESIS 4.860. Survival analysis presented that in these two gene sets, high enrichment scores were all significantly related to worse overall survival. The hub genes from T gene set included CXCL1, CXCL2, CXCL8, ALDOA, TALDO1, LDHA, CCL4, FCAR, FCER1G, SAMSN1, LILRB1, SH3PXD2B, PPM1N, PKM, and FKBP4. As for N gene sets, the hub genes included ITGAV, PTK2, SPP1, THBD, and APOH.

Conclusions: Three novel subtypes and two prognostic gene sets were identified. 15 hub genes for T gene set and 5 hub genes for N gene set were discovered. Based on these findings, we can develop more and more effective treatments for cervical cancer patients. Based on the gene enriched pathways, we can development specific drugs targeting the pathways.