Computational Efficient Approximations of the Concordance Probability in a Big Data Setting

Robin Van Oirbeek; Jolien Ponnet; Bart Baesens; Tim Verdonck

doi:10.1089/big.2022.0107

Computational Efficient Approximations of the Concordance Probability in a Big Data Setting

Big Data. 2023 Jun 7. doi: 10.1089/big.2022.0107. Online ahead of print.

Authors

Robin Van Oirbeek¹, Jolien Ponnet², Bart Baesens^{3

4}, Tim Verdonck^{2

5}

Affiliations

¹ Data Office, Allianz Benelux, Brussels, Belgium.
² Department of Mathematics, Faculty of Science, KU Leuven, Leuven, Belgium.
³ Faculty of Economics and Business, KU Leuven, Leuven, Belgium.
⁴ School of Management, University of Southampton, Southampton, United Kingdom.
⁵ Department of Mathematics, Faculty of Science, UAntwerp-imec, Antwerp, Belgium.

PMID: 37289184
DOI: 10.1089/big.2022.0107

Abstract

Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.

Keywords: AUC; C-index; clustering; efficient algorithm; performance measure.