Automatic Variable Selection Algorithms in Prognostic Factor Research in Neck Pain

J Clin Med. 2023 Sep 27;12(19):6232. doi: 10.3390/jcm12196232.

Abstract

This study aims to compare the variable selection strategies of different machine learning (ML) and statistical algorithms in the prognosis of neck pain (NP) recovery. A total of 3001 participants with NP were included. Three dichotomous outcomes of an improvement in NP, arm pain (AP), and disability at 3 months follow-up were used. Twenty-five variables (twenty-eight parameters) were included as predictors. There were more parameters than variables, as some categorical variables had >2 levels. Eight modelling techniques were compared: stepwise regression based on unadjusted p values (stepP), on adjusted p values (stepPAdj), on Akaike information criterion (stepAIC), best subset regression (BestSubset) least absolute shrinkage and selection operator [LASSO], Minimax concave penalty (MCP), model-based boosting (mboost), and multivariate adaptive regression splines (MuARS). The algorithm that selected the fewest predictors was stepPAdj (number of predictors, p = 4 to 8). MuARS was the algorithm with the second fewest predictors selected (p = 9 to 14). The predictor selected by all algorithms with the largest coefficient magnitude was "having undergone a neuroreflexotherapy intervention" for NP (β = from 1.987 to 2.296) and AP (β = from 2.639 to 3.554), and "Imaging findings: spinal stenosis" (β = from -1.331 to -1.763) for disability. Stepwise regression based on adjusted p-values resulted in the sparsest models, which enhanced clinical interpretability. MuARS appears to provide the optimal balance between model sparsity whilst retaining high predictive performance across outcomes. Different algorithms produced similar performances but resulted in a different number of variables selected. Rather than relying on any single algorithm, confidence in the variable selection may be increased by using multiple algorithms.

Keywords: machine learning; neck pain; prognosis; statistics; variable selection.

Grants and funding

This research received no external funding.