Performance Analysis of Conventional Machine Learning Algorithms for Identification of Chronic Kidney Disease in Type 1 Diabetes Mellitus Patients

Nakib Hayat Chowdhury; Mamun Bin Ibne Reaz; Fahmida Haque; Shamim Ahmad; Sawal Hamid Md Ali; Ahmad Ashrif A Bakar; Mohammad Arif Sobhan Bhuiyan

doi:10.3390/diagnostics11122267

Performance Analysis of Conventional Machine Learning Algorithms for Identification of Chronic Kidney Disease in Type 1 Diabetes Mellitus Patients

Diagnostics (Basel). 2021 Dec 3;11(12):2267. doi: 10.3390/diagnostics11122267.

Authors

Nakib Hayat Chowdhury^{1

2}, Mamun Bin Ibne Reaz¹, Fahmida Haque¹, Shamim Ahmad³, Sawal Hamid Md Ali¹, Ahmad Ashrif A Bakar¹, Mohammad Arif Sobhan Bhuiyan⁴

Affiliations

¹ Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia.
² Department of Computer Science and Engineering, Bangladesh Army University of Science and Technology (BAUST), Saidpur Cantonment, Saidpur 5310, Bangladesh.
³ Department of Computer Science and Engineering, University of Rajshahi, Rajshahi 6205, Bangladesh.
⁴ Department of Electrical and Electronics Engineering, Xiamen University Malaysia, Bandar Sunsuria, Sepang 43900, Selangor, Malaysia.

Abstract

Chronic kidney disease (CKD) is one of the severe side effects of type 1 diabetes mellitus (T1DM). However, the detection and diagnosis of CKD are often delayed because of its asymptomatic nature. In addition, patients often tend to bypass the traditional urine protein (urinary albumin)-based CKD detection test. Even though disease detection using machine learning (ML) is a well-established field of study, it is rarely used to diagnose CKD in T1DM patients. This research aimed to employ and evaluate several ML algorithms to develop models to quickly predict CKD in patients with T1DM using easily available routine checkup data. This study analyzed 16 years of data of 1375 T1DM patients, obtained from the Epidemiology of Diabetes Interventions and Complications (EDIC) clinical trials directed by the National Institute of Diabetes, Digestive, and Kidney Diseases, USA. Three data imputation techniques (RF, KNN, and MICE) and the SMOTETomek resampling technique were used to preprocess the primary dataset. Ten ML algorithms including logistic regression (LR), k-nearest neighbor (KNN), Gaussian naïve Bayes (GNB), support vector machine (SVM), stochastic gradient descent (SGD), decision tree (DT), gradient boosting (GB), random forest (RF), extreme gradient boosting (XGB), and light gradient-boosted machine (LightGBM) were applied to developed prediction models. Each model included 19 demographic, medical history, behavioral, and biochemical features, and every feature's effect was ranked using three feature ranking techniques (XGB, RF, and Extra Tree). Lastly, each model's ROC, sensitivity (recall), specificity, accuracy, precision, and F-1 score were estimated to find the best-performing model. The RF classifier model exhibited the best performance with 0.96 (±0.01) accuracy, 0.98 (±0.01) sensitivity, and 0.93 (±0.02) specificity. LightGBM performed second best and was quite close to RF with 0.95 (±0.06) accuracy. In addition to these two models, KNN, SVM, DT, GB, and XGB models also achieved more than 90% accuracy.

Keywords: chronic kidney disease; machine learning; prediction model; type 1 diabetes mellitus.

Abstract

Grants and funding