Using machine learning methods to study the tumour microenvironment and its biomarkers in osteosarcoma metastasis

Heliyon. 2024 Apr 5;10(7):e29322. doi: 10.1016/j.heliyon.2024.e29322. eCollection 2024 Apr 15.

Abstract

Background: The long-term prognosis for patients with osteosarcoma (OS) metastasis remains unfavourable, highlighting the urgent need for research that explores potential biomarkers using innovative methodologies.

Methods: This study explored potential biomarkers for OS metastasis by analysing data from the Cancer Genome Atlas Program (TCGA) and Gene Expression Omnibus (GEO) databases. The synthetic minority oversampling technique (SMOTE) was employed to tackle class imbalances, while genes were selected using four feature selection algorithms (Monte Carlo feature selection [MCFS], Borota, minimum-redundancy maximum-relevance [mRMR], and light gradient-boosting machine [LightGBM]) based on the gene expression matrix. Four machine learning (ML) algorithms (support vector machine [SVM], extreme gradient boosting [XGBoost], random forest [RF], and k-nearest neighbours [kNN]) were utilized to determine the optimal number of genes for building the model. Interpretable machine learning (IML) was applied to construct prediction networks, revealing potential relationships among the selected genes. Additionally, enrichment analysis, survival analysis, and immune infiltration were performed on the featured genes.

Results: In DS1, DS2, and DS3, the IML algorithm identified 53, 45, and 46 features, respectively. Using the merged gene set, we obtained a total of 79 interpretable prediction rules for OS metastasis. We subsequently conducted an in-depth investigation on 39 crucial molecules associated with predicting OS metastasis, elucidating their roles within the tumour microenvironment. Importantly, we found that certain genes act as both predictors and differentially expressed genes. Finally, our study unveiled statistically significant differences in survival between the high and low expression groups of TRIP4, S100A9, SELL and SLC11A1, and there was a certain correlation between these genes and 22 various immune cells.

Conclusions: The biomarkers discovered in this study hold significant implications for personalized therapies, potentially enhancing the clinical prognosis of patients with OS.

Keywords: Biomarkers; Feature selection; Machine learning; Osteosarcoma; Tumour microenvironment.