Intelligent Machine Learning Approach for Effective Recognition of Diabetes in E-Healthcare Using Clinical Data

Sensors (Basel). 2020 May 6;20(9):2649. doi: 10.3390/s20092649.

Abstract

Significant attention has been paid to the accurate detection of diabetes. It is a big challenge for the research community to develop a diagnosis system to detect diabetes in a successful way in the e-healthcare environment. Machine learning techniques have an emerging role in healthcare services by delivering a system to analyze the medical data for diagnosis of diseases. The existing diagnosis systems have some drawbacks, such as high computation time, and low prediction accuracy. To handle these issues, we have proposed a diagnosis system using machine learning methods for the detection of diabetes. The proposed method has been tested on the diabetes data set which is a clinical dataset designed from patient's clinical history. Further, model validation methods, such as hold out, K-fold, leave one subject out and performance evaluation metrics, includes accuracy, specificity, sensitivity, F1-score, receiver operating characteristic curve, and execution time have been used to check the validity of the proposed system. We have proposed a filter method based on the Decision Tree (Iterative Dichotomiser 3) algorithm for highly important feature selection. Two ensemble learning algorithms, Ada Boost and Random Forest, are also used for feature selection and we also compared the classifier performance with wrapper based feature selection algorithms. Classifier Decision Tree has been used for the classification of healthy and diabetic subjects. The experimental results show that the proposed feature selection algorithm selected features improve the classification performance of the predictive model and achieved optimal accuracy. Additionally, the proposed system performance is high compared to the previous state-of-the-art methods. High performance of the proposed method is due to the different combinations of selected features set and Plasma glucose concentrations, Diabetes pedigree function, and Blood mass index are more significantly important features in the dataset for prediction of diabetes. Furthermore, the experimental results statistical analysis demonstrated that the proposed method would effectively detect diabetes and can be deployed in an e-healthcare environment.

Keywords: decision tree; diabetes disease; e-healthcare; feature selection; machine learning; medical data; performance.

MeSH terms

  • Algorithms
  • Delivery of Health Care
  • Diabetes Mellitus* / diagnosis
  • Humans
  • Machine Learning*
  • ROC Curve
  • Telemedicine*