Combination of unsupervised discretization methods for credit risk

PLoS One. 2023 Nov 27;18(11):e0289130. doi: 10.1371/journal.pone.0289130. eCollection 2023.

Abstract

Creating robust and explainable statistical learning models is essential in credit risk management. For this purpose, equally spaced or frequent discretization is the de facto choice when building predictive models. The methods above have limitations, given that when the discretization procedure is constrained, the underlying patterns are lost. This study introduces an innovative approach by combining traditional discretization techniques with clustering-based discretization, specifically k means and Gaussian mixture models. The study proposes two combinations: Discrete Competitive Combination (DCC) and Discrete Exhaustive Combination (DEC). Discrete Competitive Combination selects features based on the discretization method that performs better on each feature, whereas Discrete Exhaustive Combination includes every discretization method to complement the information not captured by each technique. The proposed combinations were tested on 11 different credit risk datasets by fitting a logistic regression model using the weight of evidence transformation over the training partition and contrasted over the validation partition. The experimental findings showed that both combinations similarly outperform individual methods for the logistic regression without compromising the computational efficiency. More importantly, the proposed method is a feasible and competitive alternative to conventional methods without reducing explainability.

MeSH terms

  • Cluster Analysis
  • Learning*
  • Logistic Models
  • Models, Statistical*

Grants and funding

This study was financially supported by Universidad Iberoamericana Ciudad de México in the form of a graduate scholarship received by JF. This study was also supported by Universidad Iberoamericana Ciudad de México in the form of salary for HP. The specific role of this author is articulated in the ‘author contributions’ section. This study was also financially supported by ANID PIA BASAL in the form of an award (AFB180003) received by SM. This study was also financially supported by FONDECYT Chile in the form of a grant (1200221) received by SM. This study was also financially supported by Chairs Program of the National Council of Humanities, Science and Technology (CONAHCYT) project (2193) award received by JV. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.