A novel method for detecting credit card fraud problems

HaiChao Du; Li Lv; Hongliang Wang; An Guo

doi:10.1371/journal.pone.0294537

A novel method for detecting credit card fraud problems

PLoS One. 2024 Mar 6;19(3):e0294537. doi: 10.1371/journal.pone.0294537. eCollection 2024.

Authors

HaiChao Du^{1

2

3}, Li Lv^{1

3}, Hongliang Wang^{1

3}, An Guo^{4

5}

Affiliations

¹ Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, China.
² University of Chinese Academy of Sciences, Beijing, China.
³ Liaoning Province Digital Twin-based Interactive System Engineering Research Center, China.
⁴ School of Computer & Information Engineering, Anyang Normal University, Anyang, Henan Province, China.
⁵ Key Laboratory of Oracle Bone Inscriptions Information Processing, Ministry of Education of China, Anyang, Henan Province, China.

Abstract

Credit card fraud is a significant problem that costs billions of dollars annually. Detecting fraudulent transactions is challenging due to the imbalance in class distribution, where the majority of transactions are legitimate. While pre-processing techniques such as oversampling of minority classes are commonly used to address this issue, they often generate unrealistic or overgeneralized samples. This paper proposes a method called autoencoder with probabilistic xgboost based on SMOTE and CGAN(AE-XGB-SMOTE-CGAN) for detecting credit card frauds.AE-XGB-SMOTE-CGAN is a novel method proposed for credit card fraud detection problems. The credit card fraud dataset comes from a real dataset anonymized by a bank and is highly imbalanced, with normal data far greater than fraud data. Autoencoder (AE) is used to extract relevant features from the dataset, enhancing the ability of feature representation learning, and are then fed into xgboost for classification according to the threshold. Additionally, in this study, we propose a novel approach that hybridizes Generative Adversarial Network (GAN) and Synthetic Minority Over-Sampling Technique (SMOTE) to tackle class imbalance problems. Our two-phase oversampling approach involves knowledge transfer and leverages the synergies of SMOTE and GAN. Specifically, GAN transforms the unrealistic or overgeneralized samples generated by SMOTE into realistic data distributions where there is not enough minority class data available for GAN to process effectively on its own. SMOTE is used to address class imbalance issues and CGAN is used to generate new, realistic data to supplement the original dataset. The AE-XGB-SMOTE-CGAN algorithm is also compared to other commonly used machine learning algorithms, such as KNN and Light GBM, and shows an overall improvement of 2% in terms of the ACC index compared to these algorithms. The AE-XGB-SMOTE-CGAN algorithm also outperforms KNN in terms of the MCC index by 30% when the threshold is set to 0.35. This indicates that the AE-XGB-SMOTE-CGAN algorithm has higher accuracy, true positive rate, true negative rate, and Matthew's correlation coefficient, making it a promising method for detecting credit card fraud.

Copyright: © 2024 Du et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Algorithms*
Dietary Supplements*
Fraud / prevention & control
Knowledge
Machine Learning

Grants and funding

The research was supported by 2022 Special Project on Industrial Foundation Reconstruction and High-Quality Development of Manufacturing Industry by the Ministry of Industry and Information Technology (MIIT); grant number: 2207-370171-07-02-269966.The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.