Predicting Physician Consultations for Low Back Pain Using Claims Data and Population-Based Cohort Data-An Interpretable Machine Learning Approach

Adrian Richter; Julia Truthmann; Jean-François Chenot; Carsten Oliver Schmidt

doi:10.3390/ijerph182212013

Predicting Physician Consultations for Low Back Pain Using Claims Data and Population-Based Cohort Data-An Interpretable Machine Learning Approach

Int J Environ Res Public Health. 2021 Nov 16;18(22):12013. doi: 10.3390/ijerph182212013.

Authors

Adrian Richter¹, Julia Truthmann², Jean-François Chenot², Carsten Oliver Schmidt¹

Affiliations

¹ Department SHIP-KEF, Institute for Community Medicine, Greifswald University Medical Center, Walther Rathenau Str. 48, 17475 Greifswald, Germany.
² Department of Family Medicine, Institute for Community Medicine, Fleischmannstr. 42, 17475 Greifswald, Germany.

Abstract

(1) Background: Predicting chronic low back pain (LBP) is of clinical and economic interest as LBP leads to disabilities and health service utilization. This study aims to build a competitive and interpretable prediction model; (2) Methods: We used clinical and claims data of 3837 participants of a population-based cohort study to predict future LBP consultations (ICD-10: M40.XX-M54.XX). Best subset selection (BSS) was applied in repeated random samples of training data (75% of data); scoring rules were used to identify the best subset of predictors. The rediction accuracy of BSS was compared to randomforest and support vector machines (SVM) in the validation data (25% of data); (3) Results: The best subset comprised 16 out of 32 predictors. Previous occurrence of LBP increased the odds for future LBP consultations (odds ratio (OR) 6.91 [5.05; 9.45]), while concomitant diseases reduced the odds (1 vs. 0, OR: 0.74 [0.57; 0.98], >1 vs. 0: 0.37 [0.21; 0.67]). The area-under-curve (AUC) of BSS was acceptable (0.78 [0.74; 0.82]) and comparable with SVM (0.78 [0.74; 0.82]) and randomforest (0.79 [0.75; 0.83]); (4) Conclusions: Regarding prediction accuracy, BSS has been considered competitive with established machine-learning approaches. Nonetheless, considerable misclassification is inherent and further refinements are required to improve predictions.

Keywords: best subset selection; calibration; low back pain; machine learning; record linkage.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cohort Studies
Humans
Low Back Pain* / epidemiology
Machine Learning
Physicians*
Referral and Consultation