A Probability-Based Models Ranking Approach: An Alternative Method of Machine-Learning Model Performance Assessment

Stanisław Gajda; Marcin Chlebus

doi:10.3390/s22176361

A Probability-Based Models Ranking Approach: An Alternative Method of Machine-Learning Model Performance Assessment

Sensors (Basel). 2022 Aug 24;22(17):6361. doi: 10.3390/s22176361.

Authors

Stanisław Gajda¹, Marcin Chlebus¹

Affiliation

¹ Faculty of Economic Sciences, University of Warsaw, Długa Street 44/50, 00-241 Warsaw, Poland.

Abstract

Performance measures are crucial in selecting the best machine learning model for a given problem. Estimating classical model performance measures by subsampling methods like bagging or cross-validation has several weaknesses. The most important ones are the inability to test the significance of the difference, and the lack of interpretability. Recently proposed Elo-based Predictive Power (EPP)-a meta-measure of machine learning model performance, is an attempt to address these weaknesses. However, the EPP is based on wrong assumptions, so its estimates may not be correct. This paper introduces the Probability-based Ranking Model Approach (PMRA), which is a modified EPP approach with a correction that makes its estimates more reliable. PMRA is based on the calculation of the probability that one model achieves a better result than another one, using the Mixed Effects Logistic Regression model. The empirical analysis was carried out on a real mortgage credits dataset. The analysis included a comparison of how the PMRA and state-of-the-art k-fold cross-validation ranked the 49 machine learning models, an example application of a novel method in hyperparameters tuning problem, and a comparison of PMRA and EPP indications. PMRA gives the opportunity to compare a newly developed algorithm to state-of-the-art algorithms based on statistical criteria. It is the solution to select the best hyperparameters configuration and to formulate criteria for the continuation of the hyperparameters space search.

Keywords: Elo-based Predictive Power; hyperparameters tuning; machine learning; mixed effects logistic regression; model performance assessment; model performance measures; model selection.

MeSH terms

Algorithms*
Logistic Models
Machine Learning*

Grants and funding

MNISW/2020/142/DIR/KH/Ministry of Education (Poland)