Machine learning-based integrated identification of predictive combined diagnostic biomarkers for endometriosis

Front Genet. 2023 Nov 27:14:1290036. doi: 10.3389/fgene.2023.1290036. eCollection 2023.

Abstract

Background: Endometriosis (EM) is a common gynecological condition in women of reproductive age, with diverse causes and a not yet fully understood pathogenesis. Traditional diagnostics rely on single diagnostic biomarkers and does not integrate a variety of different biomarkers. This study introduces multiple machine learning techniques, enhancing the accuracy of predictive models. A novel diagnostic approach that combines various biomarkers provides a new clinical perspective for improving the diagnostic efficiency of endometriosis, holding significant potential for clinical application. Methods: In this study, GSE51981 was used as a test set, and 11 machine learning algorithms (Lasso, Stepglm, glmBoost, Support Vector Machine, Ridge, Enet, plsRglm, Random Forest, LDA, XGBoost, and NaiveBayes) were employed to construct 113 predictive models for endometriosis. The optimal model was determined based on the AUC values derived from various algorithms. These genes were then evaluated using nine machine learning algorithms (Random Forest, SVM, Gradient Boosting Machine, LASSO, XGB, NNET, Generalized Linear Model, KNN, and Decision Tree) to assess significance scores and identify diagnostic genes for each algorithm. The diagnostic value of these genes was further validated in external datasets from GSE7305, GSE11691, and GSE120103. Results: Analysis of the GSE51981 dataset revealed 62 DEGs. The Stepglm [Both] and plsRglm algorithms identified 30 genes with the most potential using the AUC evaluation. Subsequently, nine machine learning algorithms were applied to select diagnostic genes, leading to the identification of five key diagnostic genes using the LASSO algorithm. The ADAT1 gene exhibited the best single-gene predictive performance, with an AUC of 0.785. A combination of genes (FOS, EPHX1, DLGAP5, PCSK5, and ADAT1) achieves an AUC of 0.836 in the test dataset. Moreover, these genes consistently exhibited an AUC exceeding 0.78 in all validation datasets, demonstrating superior predictive performance. Furthermore, correlation analysis with immune infiltration strengthened their predictive value by demonstrating the close relationship of the diagnostic genes with immune infiltrating cells. Conclusion: A combination of biomarkers consisting of FOS, EPHX1, DLGAP5, PCSK5, and ADAT1 can serve as a diagnostic tool for endometriosis, enhancing diagnostic efficiency. The association of these genes with immune infiltrating cells reveals their potential role in the pathogenesis of endometriosis, providing new insights for early detection and treatment.

Keywords: combined biomarkers; diagnostic; endometriosis; machine learning; predictive.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by grants from the Malaysian Government Fundamental Research Grant Scheme (FRGS 2019; 203 CIPPT 6711727), the Universiti Sains Malaysia Short Term Grant (ST; 304 CIPPT 6315469) and the National Natural Science Foundation of China (81960877 and 82104909). Also, funding came from the University Innovation Fund of Gansu Province (No. 2021A-076), the Gansu Province Science and Technology Plan (Innovation Base and Talent Plan) Project (No. 21JR7RA561), the Special open project of Gansu Research Center of Traditional Chinese Medicine (No. zyzx-2020-zx10), the Natural Science Foundation of Gansu Province (No. 21JR1RA267), the Education Technology Innovation Project of Gansu Province (No. 2022A-067), the Innovation Fund of Higher Education of Gansu Province (No. 2023A-088) and the Natural Science Foundation of Gansu Province (No. 22JR5RA582). Also funding was awarded under the Gansu Province science and technology plan international cooperation field project (No.23YFWA0005) and the Project of Chinese Medicine Science and Technology Program of Zhejiang Province (2022ZB127). These funds were instrumental in the conceptualization and design process of the studies reported in this study. They also provided input during the initial stages of study planning.