Revolutionizing Diabetes Diagnosis: Machine Learning Techniques Unleashed

Healthcare (Basel). 2023 Oct 31;11(21):2864. doi: 10.3390/healthcare11212864.

Abstract

The intricate and multifaceted nature of diabetes disrupts the body's crucial glucose processing mechanism, which serves as a fundamental energy source for the cells. This research aims to predict the occurrence of diabetes in individuals by harnessing the power of machine learning algorithms, utilizing the PIMA diabetes dataset. The selected algorithms employed in this study encompass Decision Tree, K-Nearest Neighbor, Random Forest, Logistic Regression, and Support Vector Machine. To execute the experiments, two software tools, namely Waikato Environment for Knowledge Analysis (WEKA) version 3.8.1 and Python version 3.10, were utilized. To evaluate the performance of the algorithms, several metrics were employed, including true positive rate, false positive rate, precision, recall, F-measure, Matthew's correlation coefficient, receiver operating characteristic area, and precision-recall curves area. Furthermore, various errors such as Mean Absolute Error, Root Mean Squared Error, Relative Absolute Error, and Root Relative Squared Error were examined to assess the accuracy of the models. Upon conducting the experiments, it was observed that Logistic Regression outperformed the other techniques, exhibiting the highest precision of 81 percent using Python and 80.43 percent using WEKA. These findings shed light on the efficacy of machine learning in predicting diabetes and highlight the potential of Logistic Regression as a valuable tool in this domain.

Keywords: Matthew’s correlation coefficient; PIMA diabetes dataset; Python; WEKA; accuracy; diabetes; machine learning algorithms.