Calibrating machine learning approaches for probability estimation: A comprehensive comparison

Francisco M Ojeda; Max L Jansen; Alexandre Thiéry; Stefan Blankenberg; Christian Weimar; Matthias Schmid; Andreas Ziegler

doi:10.1002/sim.9921

Calibrating machine learning approaches for probability estimation: A comprehensive comparison

Stat Med. 2023 Dec 20;42(29):5451-5478. doi: 10.1002/sim.9921. Epub 2023 Oct 17.

Authors

Francisco M Ojeda^{1

2}, Max L Jansen³, Alexandre Thiéry³, Stefan Blankenberg^{1

2

4}, Christian Weimar^{5

6}, Matthias Schmid⁷, Andreas Ziegler^{1

2

3

8

9}

Affiliations

¹ Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
² Centre for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
³ Cardio-CARE, Medizincampus Davos, Davos, Switzerland.
⁴ German Center for Cardiovascular Research (DZHK), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany.
⁵ BDH-Klinik Elzach, Baden-Wuerttemberg, Germany.
⁶ Institute for Medical Informatics, Biometry and Epidemiology, University of Duisburg-Essen, Essen, North Rhine-Westphalia, Germany.
⁷ Institute of Medical Biometry, Informatics and Epidemiology, Faculty of Medicine, University of Bonn, Bonn, North Rhine-Westphalia, Germany.
⁸ School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa.
⁹ Swiss Institute of Bioinformatics, Lausanne, Waadt, Switzerland.

PMID: 37849356
DOI: 10.1002/sim.9921

Abstract

Statistical prediction models have gained popularity in applied research. One challenge is the transfer of the prediction model to a different population which may be structurally different from the model for which it has been developed. An adaptation to the new population can be achieved by calibrating the model to the characteristics of the target population, for which numerous calibration techniques exist. In view of this diversity, we performed a systematic evaluation of various popular calibration approaches used by the statistical and the machine learning communities for estimating two-class probabilities. In this work, we first provide a review of the literature and, second, present the results of a comprehensive simulation study. The calibration approaches are compared with respect to their empirical properties and relationships, their ability to generalize precise probability estimates to external populations and their availability in terms of easy-to-use software implementations. Third, we provide code from real data analysis allowing its application by researchers. Logistic calibration and beta calibration, which estimate an intercept plus one and two slope parameters, respectively, consistently showed the best results in the simulation studies. Calibration on logit transformed probability estimates generally outperformed calibration methods on nontransformed estimates. In case of structural differences between training and validation data, re-estimation of the entire prediction model should be outweighted against sample size of the validation data. We recommend regression-based calibration approaches using transformed probability estimates, where at least one slope is estimated in addition to an intercept for updating probability estimates in validation studies.

Keywords: calibration; logistic regression; machine learning; probability estimation; probability machine; updating.

Publication types

Review

MeSH terms

Humans
Logistic Models
Machine Learning*
Models, Statistical*
Probability
Software