An Integrated Classification and Association Rule Technique for Early-Stage Diabetes Risk Prediction

Healthcare (Basel). 2022 Oct 18;10(10):2070. doi: 10.3390/healthcare10102070.

Abstract

The number of diabetic patients is increasing yearly worldwide, requiring the need for a quick intervention to help these people. Mortality rates are higher for diabetic patients with other serious health complications. Thus, early prediction for such diseases positively impacts healthcare quality and can prevent serious health complications later. This paper constructs an efficient prediction system for predicting diabetes in its early stage. The proposed system starts with a Local Outlier Factor (LOF)-based outlier detection technique to detect outlier data. A Balanced Bagging Classifier (BBC) technique is used to balance data distribution. Finally, integration between association rules and classification algorithms is used to develop a prediction model based on real data. Four classification algorithms were utilized in addition to an a priori algorithm that discovered relationships between various factors. The named algorithms are Artificial Neural Network (ANN), Decision Trees (DT), Support Vector Machines (SVM), and K Nearest Neighbor (KNN) for data classification. Results revealed that KNN provided the highest accuracy of 97.36% compared to the other applied algorithms. An a priori algorithm extracted association rules based on the Lift matrix. Four association rules from 12 attributes with the highest correlation and information gain scores relative to the class attribute were produced.

Keywords: BBC; LOF; association rules and healthcare data analytics; classification algorithms; data mining; disease prediction; early-stage diabetes.