Loan default prediction of Chinese P2P market: a machine learning methodology

Sci Rep. 2021 Sep 21;11(1):18759. doi: 10.1038/s41598-021-98361-6.

Abstract

Repayment failures of borrowers have greatly affected the sustainable development of the peer-to-peer (P2P) lending industry. The latest literature reveals that existing risk evaluation systems may ignore important signals and risk factors affecting P2P repayment. In our study, we applied four machine learning methods (random forest (RF), extreme gradient boosting tree (XGBT), gradient boosting model (GBM), and neural network (NN)) to predict important factors affecting repayment by utilizing data from Renrendai.com in China from Thursday, January 1, 2015, to Tuesday, June 30, 2015. The results showed that borrowers who have passed video, mobile phone, job, residence or education level verification are more likely to default on loan repayment, whereas those who have passed identity and asset certification are less likely to default on loans. The accuracy and kappa value of the four methods all exceed 90%, and RF is superior to the other classification models. Our findings demonstrate important techniques for borrower screening by P2P companies and risk regulation by regulatory agencies. Our methodology and findings will help regulators, banks and creditors combat current financial disasters caused by the coronavirus disease 2019 (COVID-19) pandemic by addressing various financial risks and translating credit scoring improvements.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19 / epidemiology
  • COVID-19 / virology
  • China / epidemiology
  • Financial Management
  • Financing, Personal / economics*
  • Financing, Personal / standards
  • Humans
  • Internet
  • Machine Learning*
  • Pandemics
  • Risk Factors
  • SARS-CoV-2 / isolation & purification