Classification and prediction of spinal disease based on the SMOTE-RFE-XGBoost model

PeerJ Comput Sci. 2023 Mar 10:9:e1280. doi: 10.7717/peerj-cs.1280. eCollection 2023.

Abstract

Spinal diseases are killers that cause long-term disturbance to people with complex and diverse symptoms and may cause other conditions. At present, the diagnosis and treatment of the main diseases mainly depend on the professional level and clinical experience of doctors, which is a breakthrough problem in the field of medicine. This article proposes the SMOTE-RFE-XGBoost model, which takes the physical angle of human bone as the research index for feature selection and classification model construction to predict spinal diseases. The research process is as follows: two groups of people with normal and abnormal spine conditions are taken as the research objects of this article, and the synthetic minority oversampling technique (SMOTE) algorithm is used to address category imbalance. Three methods, least absolute shrinkage and selection operator (LASSO), tree-based feature selection, and recursive feature elimination (RFE), are used for feature selection. Logistic regression (LR), support vector machine (SVM), parsimonious Bayes, decision tree (DT), random forest (RF), gradient boosting tree (GBT), extreme gradient boosting (XGBoost), and ridge regression models are used to classify the samples, construct single classification models and combine classification models and rank the feature importance. According to the accuracy and mean square error (MSE) values, the SMOTE-RFE-XGBoost combined model has the best classification, with accuracy, MSE and F1 values of 97.56%, 0.1111 and 0.8696, respectively. The importance of four indicators, lumbar slippage, cervical tilt, pelvic radius and pelvic tilt, was higher.

Keywords: Classification prediction; Feature selection; Machine learning; Spinal disorders; XGBoost.

Grants and funding

The research is supported by Natural Science Foundations of Shandong Province (Grant No. ZR2021QF036), and by “Guangyue Young Scholar Innovation Team” of Liaocheng University (Grant No. LCUGYTD2022-03). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.