A novel stacking technique for prediction of diabetes

Comput Biol Med. 2021 Aug:135:104554. doi: 10.1016/j.compbiomed.2021.104554. Epub 2021 Jun 8.

Abstract

Background: Machine Learning (ML) represents a rapidly growing technology that supplies the most effective solutions for solving complex problems. The application of ML techniques in healthcare is gaining more attention because of ML-associated automatic pattern identification mechanisms. Diabetes is characterized by hyperglycemia resulting from improper insulin secretion and/or insulin utilization.

Methods: The PIMA Indian diabetes dataset was obtained from the University of California/Irvine (UCI) machine learning repository for experimental purposes. The study was carried out in three stages: (1) a correlation technique was developed for feature selection; (2) the AdaBoost technique was implemented on selected features for classification; and (3) a novel stacking technique with multi-layer perceptron, support vector machine, and logistic regression (MLP, SVM, and LR, respectively) was designed and developed for the selected features.

Results: The proposed stacking technique integrated the intelligent models and led to an improvement in model performance, thereby overcoming the issue of generating multiple decision stumps by AdaBoost. The proposed novel stacking technique outperformed other models when compared with AdaBoost in terms of performance metrics. The proposed models were then implemented on other datasets, such as the Cleveland heart disease and Wisconsin breast cancer diagnostic datasets, to illustrate their broader applications.

Conclusion: Stacking can outperform other models when compared with the other reported techniques that were implemented using the PIMA Indian diabetes dataset.

Keywords: AdaBoost classifier; Logistic regression; Multilayer perceptron; Stacking; Support vector machines.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Diabetes Mellitus* / diagnosis
  • Humans
  • Logistic Models
  • Machine Learning*
  • Neural Networks, Computer
  • Support Vector Machine