Secure and privacy-preserving automated machine learning operations into end-to-end integrated IoT-edge-artificial intelligence-blockchain monitoring system for diabetes mellitus prediction

Comput Struct Biotechnol J. 2023 Nov 23:23:212-233. doi: 10.1016/j.csbj.2023.11.038. eCollection 2024 Dec.

Abstract

Diabetes Mellitus, one of the leading causes of death worldwide, has no cure to date and can lead to severe health complications, such as retinopathy, limb amputation, cardiovascular diseases, and neuronal disease, if left untreated. Consequently, it becomes crucial to be able to monitor and predict the incidence of diabetes. Machine learning approaches have been proposed and evaluated in the literature for diabetes prediction. This paper proposes an IoT-edge-Artificial Intelligence (AI)-blockchain system for diabetes prediction based on risk factors. The proposed system is underpinned by blockchain to obtain a cohesive view of the risk factors data from patients across different hospitals and ensure security and privacy of the user's data. We provide a comparative analysis of different medical sensors, devices, and methods to measure and collect the risk factors values in the system. Numerical experiments and comparative analysis were carried out within our proposed system, using the most accurate random forest (RF) model, and the two most used state-of-the-art machine learning approaches, Logistic Regression (LR) and Support Vector Machine (SVM), using three real-life diabetes datasets. The results show that the proposed system predicts diabetes using RF with 4.57% more accuracy on average in comparison with the other models LR and SVM, with 2.87 times more execution time. Data balancing without feature selection does not show significant improvement. When using feature selection, the performance is improved by 1.14% for PIMA Indian and 0.02% for Sylhet datasets, while it is reduced by 0.89% for MIMIC III.

Keywords: Artificial intelligence (AI); Blockchain; Diabetes mellitus type 2; Diagnosis; Digital health; Logistic regression (LR); Machine learning; Prognosis; Random forest (RF); Risk factors; Smart connected healthcare; Support vector machine (SVM); eHealth.