Foundations of Feature Selection in Clinical Prediction Modeling

Acta Neurochir Suppl. 2022:134:51-57. doi: 10.1007/978-3-030-85292-4_7.

Abstract

Selecting a set of features to include in a clinical prediction model is not always a simple task. The goals of creating parsimonious models with low complexity while, at the same time, upholding predictive performance by explaining a large proportion of the variance within the dependent variable must be balanced. With this aim, one must consider the clinical setting and what data are readily available to clinicians at specific timepoints, as well as more obvious aspects such as the availability of computational power and size of the training dataset. This chapter elucidates the importance and pitfalls in feature selection, focusing on applications in clinical prediction modeling. We demonstrate simple methods such as correlation-, significance-, and variable importance-based filtering, as well as intrinsic feature selection methods such as Lasso and tree- or rule-based methods. Finally, we focus on two algorithmic wrapper methods for feature selection that are commonly used in machine learning: Recursive Feature Elimination (RFE), which can be applied regardless of data and model type, as well as Purposeful Variable Selection as described by Hosmer and Lemeshow, specifically for generalized linear models.

Keywords: Artificial intelligence; Feature selection; Foundations; Machine Learning; Methods; Recursive feature elimination.

MeSH terms

  • Algorithms*
  • Machine Learning
  • Models, Statistical
  • Prognosis
  • Support Vector Machine*