A diagnostic model based on bioinformatics and machine learning to differentiate bipolar disorder from schizophrenia and major depressive disorder

Schizophrenia (Heidelb). 2024 Feb 14;10(1):16. doi: 10.1038/s41537-023-00417-1.

Abstract

Bipolar disorder (BD) showed the highest suicide rate of all psychiatric disorders, and its underlying causative genes and effective treatments remain unclear. During diagnosis, BD is often confused with schizophrenia (SC) and major depressive disorder (MDD), due to which patients may receive inadequate or inappropriate treatment, which is detrimental to their prognosis. This study aims to establish a diagnostic model to distinguish BD from SC and MDD in multiple public datasets through bioinformatics and machine learning and to provide new ideas for diagnosing BD in the future. Three brain tissue datasets containing BD, SC, and MDD were chosen from the Gene Expression Omnibus database (GEO), and two peripheral blood datasets were selected for validation. Linear Models for Microarray Data (Limma) analysis was carried out to identify differentially expressed genes (DEGs). Functional enrichment analysis and machine learning were utilized to identify. Least absolute shrinkage and selection operator (LASSO) regression was employed for identifying candidate immune-associated central genes, constructing protein-protein interaction networks (PPI), building artificial neural networks (ANN) for validation, and plotting receiver operating characteristic curve (ROC curve) for differentiating BD from SC and MDD and creating immune cell infiltration to study immune cell dysregulation in the three diseases. RBM10 was obtained as a candidate gene to distinguish BD from SC. Five candidate genes (LYPD1, HMBS, HEBP2, SETD3, and ECM2) were obtained to distinguish BD from MDD. The validation was performed by ANN, and ROC curves were plotted for diagnostic value assessment. The outcomes exhibited the prediction model to have a promising diagnostic value. In the immune infiltration analysis, Naive B, Resting NK, and Activated Mast Cells were found to be substantially different between BD and SC. Naive B and Memory B cells were prominently variant between BD and MDD. In this study, RBM10 was found as a candidate gene to distinguish BD from SC; LYPD1, HMBS, HEBP2, SETD3, and ECM2 serve as five candidate genes to distinguish BD from MDD. The results obtained from the ANN network showed that these candidate genes could perfectly distinguish BD from SC and MDD (76.923% and 81.538%, respectively).