Identification of ZMYND19 as a novel biomarker of colorectal cancer: RNA-sequencing and machine learning analysis

J Cell Commun Signal. 2023 Dec;17(4):1469-1485. doi: 10.1007/s12079-023-00779-2. Epub 2023 Jul 10.

Abstract

Colorectal cancer (CRC) is the third most common cause of cancer-related deaths. The five-year relative survival rate for CRC is estimated to be approximately 90% for patients diagnosed with early stages and 14% for those diagnosed at an advanced stages of disease, respectively. Hence, the development of accurate prognostic markers is required. Bioinformatics enables the identification of dysregulated pathways and novel biomarkers. RNA expression profiling was performed in CRC patients from the TCGA database using a Machine Learning approach to identify differential expression genes (DEGs). Survival curves were assessed using Kaplan-Meier analysis to identify prognostic biomarkers. Furthermore, the molecular pathways, protein-protein interaction, the co-expression of DEGs, and the correlation between DEGs and clinical data have been evaluated. The diagnostic markers were then determined based on machine learning analysis. The results indicated that key upregulated genes are associated with the RNA processing and heterocycle metabolic process, including C10orf2, NOP2, DKC1, BYSL, RRP12, PUS7, MTHFD1L, and PPAT. Furthermore, the survival analysis identified NOP58, OSBPL3, DNAJC2, and ZMYND19 as prognostic markers. The combineROC curve analysis indicated that the combination of C10orf2 -PPAT- ZMYND19 can be considered as diagnostic markers with sensitivity, specificity, and AUC values of 0.98, 1.00, and 0.99, respectively. Eventually, ZMYND19 gene was validated in CRC patients. In conclusion, novel biomarkers of CRC have been identified that may be a promising strategy for early diagnosis, potential treatment, and better prognosis.

Keywords: Bioinformatic analysis; Biomarker; CRC; Machine learning.