Foundations of Feature Selection in Clinical Prediction Modeling

Victor E Staartjes; Julius M Kernbach; Vittorio Stumpo; Christiaan H B van Niftrik; Carlo Serra; Luca Regli

doi:10.1007/978-3-030-85292-4_7

Foundations of Feature Selection in Clinical Prediction Modeling

Acta Neurochir Suppl. 2022:134:51-57. doi: 10.1007/978-3-030-85292-4_7.

Authors

Victor E Staartjes¹, Julius M Kernbach^{2

3}, Vittorio Stumpo⁴, Christiaan H B van Niftrik⁴, Carlo Serra⁴, Luca Regli⁴

Affiliations

¹ Machine Intelligence in Clinical Neuroscience (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland. victoregon.staartjes@usz.ch.
² Neurosurgical Artificial Intelligence Laboratory Aachen (NAILA), Department of Neurosurgery, RWTH Aachen University Hospital, Aachen, Germany.
³ Department of Neurosurgery, Faculty of Medicine, RWTH Aachen University, Aachen, Germany.
⁴ Machine Intelligence in Clinical Neuroscience (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland.

PMID: 34862527
DOI: 10.1007/978-3-030-85292-4_7

Abstract

Selecting a set of features to include in a clinical prediction model is not always a simple task. The goals of creating parsimonious models with low complexity while, at the same time, upholding predictive performance by explaining a large proportion of the variance within the dependent variable must be balanced. With this aim, one must consider the clinical setting and what data are readily available to clinicians at specific timepoints, as well as more obvious aspects such as the availability of computational power and size of the training dataset. This chapter elucidates the importance and pitfalls in feature selection, focusing on applications in clinical prediction modeling. We demonstrate simple methods such as correlation-, significance-, and variable importance-based filtering, as well as intrinsic feature selection methods such as Lasso and tree- or rule-based methods. Finally, we focus on two algorithmic wrapper methods for feature selection that are commonly used in machine learning: Recursive Feature Elimination (RFE), which can be applied regardless of data and model type, as well as Purposeful Variable Selection as described by Hosmer and Lemeshow, specifically for generalized linear models.

Keywords: Artificial intelligence; Feature selection; Foundations; Machine Learning; Methods; Recursive feature elimination.

MeSH terms

Algorithms*
Machine Learning
Models, Statistical
Prognosis
Support Vector Machine*