Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison

Comput Biol Med. 2021 Sep:136:104672. doi: 10.1016/j.compbiomed.2021.104672. Epub 2021 Jul 21.

Abstract

Machine learning and data mining-based approaches to prediction and detection of heart disease would be of great clinical utility, but are highly challenging to develop. In most countries there is a lack of cardiovascular expertise and a significant rate of incorrectly diagnosed cases which could be addressed by developing accurate and efficient early-stage heart disease prediction by analytical support of clinical decision-making with digital patient records. This study aimed to identify machine learning classifiers with the highest accuracy for such diagnostic purposes. Several supervised machine-learning algorithms were applied and compared for performance and accuracy in heart disease prediction. Feature importance scores for each feature were estimated for all applied algorithms except MLP and KNN. All the features were ranked based on the importance score to find those giving high heart disease predictions. This study found that using a heart disease dataset collected from Kaggle three-classification based on k-nearest neighbor (KNN), decision tree (DT) and random forests (RF) algorithms the RF method achieved 100% accuracy along with 100% sensitivity and specificity. Thus, we found that a relatively simple supervised machine learning algorithm can be used to make heart disease predictions with very high accuracy and excellent potential utility.

Keywords: Cardiovascular disease; Decision tree; KNN; Machine learning; Random forest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Heart Diseases*
  • Humans
  • Supervised Machine Learning*