A Novel Approach for Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods

Comput Intell Neurosci. 2022 Apr 15:2022:3820360. doi: 10.1155/2022/3820360. eCollection 2022.

Abstract

An active research area where the experts from the medical field are trying to envisage the problem with more accuracy is diabetes prediction. Surveys conducted by WHO have shown a remarkable increase in the diabetic patients. Diabetes generally remains in dormant mode and it boosts the other diseases if patients are diagnosed with some other disease such as damage to the kidney vessels, problems in retina of the eye, and cardiac problem; if unidentified, it can create metabolic disorders and too many complications in the body. The main objective of our study is to draw a comparative study of different classifiers and feature selection methods to predict the diabetes with greater accuracy. In this paper, we have studied multilayer perceptron, decision trees, K-nearest neighbour, and random forest classifiers and few feature selection techniques were applied on the classifiers to detect the diabetes at an early stage. Raw data is subjected to preprocessing techniques, thus removing outliers and imputing missing values by mean and then in the end hyperparameters optimization. Experiments were conducted on PIMA Indians diabetes dataset using Weka 3.9 and the accuracy achieved for multilayer perceptron is 77.60%, for decision trees is 76.07%, for K-nearest neighbour is 78.58%, and for random forest is 79.8%, which is by far the best accuracy for random forest classifier.

MeSH terms

  • Cluster Analysis
  • Diabetes Mellitus* / diagnosis
  • Humans
  • Machine Learning*
  • Neural Networks, Computer