Healthcare insurance fraud detection using data mining

Zain Hamid; Fatima Khalique; Saba Mahmood; Ali Daud; Amal Bukhari; Bader Alshemaimri

doi:10.1186/s12911-024-02512-4

Healthcare insurance fraud detection using data mining

BMC Med Inform Decis Mak. 2024 Apr 26;24(1):112. doi: 10.1186/s12911-024-02512-4.

Authors

Zain Hamid^#¹, Fatima Khalique^#¹, Saba Mahmood^#¹, Ali Daud^#², Amal Bukhari^#³, Bader Alshemaimri^#⁴

Affiliations

¹ Department of Computer Science, Bahria University, Islamabad, Pakistan.
² Faculty of Resilience, Rabdan Academy, Abu Dhabi, United Arab Emirates. alimsdb@gmail.com.
³ Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia.
⁴ Software Engineering Department, College of Computing and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

^# Contributed equally.

Abstract

Background: Healthcare programs and insurance initiatives play a crucial role in ensuring that people have access to medical care. There are many benefits of healthcare insurance programs but fraud in healthcare continues to be a significant challenge in the insurance industry. Healthcare insurance fraud detection faces challenges from evolving and sophisticated fraud schemes that adapt to detection methods. Analyzing extensive healthcare data is hindered by complexity, data quality issues, and the need for real-time detection, while privacy concerns and false positives pose additional hurdles. The lack of standardization in coding and limited resources further complicate efforts to address fraudulent activities effectively.

Methodolgy: In this study, a fraud detection methodology is presented that utilizes association rule mining augmented with unsupervised learning techniques to detect healthcare insurance fraud. Dataset from the Centres for Medicare and Medicaid Services (CMS) 2008-2010 DE-SynPUF is used for analysis. The proposed methodology works in two stages. First, association rule mining is used to extract frequent rules from the transactions based on patient, service and service provider features. Second, the extracted rules are passed to unsupervised classifiers, such as IF, CBLOF, ECOD, and OCSVM, to identify fraudulent activity.

Results: Descriptive analysis shows patterns and trends in the data revealing interesting relationship among diagnosis codes, procedure codes and the physicians. The baseline anomaly detection algorithms generated results in 902.24 seconds. Another experiment retrieved frequent rules using association rule mining with apriori algorithm combined with unsupervised techniques in 868.18 seconds. The silhouette scoring method calculated the efficacy of four different anomaly detection techniques showing CBLOF with highest score of 0.114 followed by isolation forest with the score of 0.103. The ECOD and OCSVM techniques have lower scores of 0.063 and 0.060, respectively.

Conclusion: The proposed methodology enhances healthcare insurance fraud detection by using association rule mining for pattern discovery and unsupervised classifiers for effective anomaly detection.

Keywords: Association rules mining techniques; Healthcare insurance; Healthcare insurance frauds; Unsupervised learning.

MeSH terms

Data Mining*
Fraud*
Humans
Insurance, Health*
United States