Regularization approaches in clinical biostatistics: A review of methods and their applications

Sarah Friedrich; Andreas Groll; Katja Ickstadt; Thomas Kneib; Markus Pauly; Jörg Rahnenführer; Tim Friede

doi:10.1177/09622802221133557

Regularization approaches in clinical biostatistics: A review of methods and their applications

Stat Methods Med Res. 2023 Feb;32(2):425-440. doi: 10.1177/09622802221133557. Epub 2022 Nov 16.

Authors

Sarah Friedrich^{1

2}, Andreas Groll³, Katja Ickstadt³, Thomas Kneib⁴, Markus Pauly³, Jörg Rahnenführer³, Tim Friede^{5

6}

Affiliations

¹ Institute of Mathematics, 26522University of Augsburg, Augsburg, Germany.
² Centre for Advanced Analytics and Predictive Sciences, University of Augsburg, Augsburg, Germany.
³ Department of Statistics, 14311TU Dortmund University, Dortmund, Germany.
⁴ Chair of Statistics and Campus Institute Data Science, 84922Georg-August-University Göttingen, Göttingen, Germany.
⁵ Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany.
⁶ DZHK (German Center for Cardiovascular Research), partner site Göttingen, Göttingen, Germany.

Abstract

A range of regularization approaches have been proposed in the data sciences to overcome overfitting, to exploit sparsity or to improve prediction. Using a broad definition of regularization, namely controlling model complexity by adding information in order to solve ill-posed problems or to prevent overfitting, we review a range of approaches within this framework including penalization, early stopping, ensembling and model averaging. Aspects of their practical implementation are discussed including available R-packages and examples are provided. To assess the extent to which these approaches are used in medicine, we conducted a review of three general medical journals. It revealed that regularization approaches are rarely applied in practical clinical applications, with the exception of random effects models. Hence, we suggest a more frequent use of regularization approaches in medical research. In situations where also other approaches work well, the only downside of the regularization approaches is increased complexity in the conduct of the analyses which can pose challenges in terms of computational resources and expertise on the side of the data analyst. In our view, both can and should be overcome by investments in appropriate computing facilities and educational resources.

Keywords: Bayesian inference; Penalization; early stopping; ensembling; evidence synthesis; model averaging.

Publication types

Review