Screening for Biomarkers for Progression from Oral Leukoplakia to Oral Squamous Cell Carcinoma and Evaluation of Diagnostic Efficacy by Multiple Machine Learning Algorithms

Cancers (Basel). 2022 Nov 25;14(23):5808. doi: 10.3390/cancers14235808.

Abstract

The aim of the study is to identify key genes during the progression from oral leukoplakia (OL) to oral squamous cell carcinoma (OSCC) and predict effective diagnoses. Weighted gene co-expression network analysis (WGCNA) and differential expression analysis were performed to identify seven genes associated with the progression from OL to OSCC. Twelve machine learning algorithms including k-nearest neighbor (KNN), neural network (NNet), and extreme gradient boosting (XGBoost) were used to construct multi-gene models, which revealed that each model had good diagnostic efficacy. The functional mechanism or the pathways associated with these genes were evaluated using enrichment analysis, subtype clustering, and immune infiltration analysis. The enrichment analysis revealed that the genes enriched were associated with the cell cycle, cell division, and intracellular energy metabolism. The immunoassay results revealed that the genes primarily affected the infiltration of proliferating T cells and macrophage polarization. Finally, a nomogram and Kaplan-Meier survival analysis were used to predict the prognostic efficacy of key genes in OSCC patients. The results showed that genes could predict the prognosis of the patients, and patients in the high-risk group had a poor prognosis. Our study identified that the seven key genes, including DHX9, BCL2L12, RAD51, MELK, CDC6, ANLN, and KIF4A, were associated with the progression from OL to OSCC. These genes had good diagnostic efficacy and could be used as potential biomarkers for the prognosis of OSCC patients.

Keywords: diagnostic model; immune infiltration; machine learning; oral leukoplakia; oral squamous cell carcinoma.