Linear discriminant analysis and principal component analysis to predict coronary artery disease

Health Informatics J. 2020 Sep;26(3):2181-2192. doi: 10.1177/1460458219899210. Epub 2020 Jan 23.

Abstract

Coronary artery disease is one of the most prevalent chronic pathologies in the modern world, leading to the deaths of thousands of people, both in the United States and in Europe. This article reports the use of data mining techniques to analyse a population of 10,265 people who were evaluated by the Department of Advanced Biomedical Sciences for myocardial ischaemia. Overall, 22 features are extracted, and linear discriminant analysis is implemented twice through both the Knime analytics platform and R statistical programming language to classify patients as either normal or pathological. The former of these analyses includes only classification, while the latter method includes principal component analysis before classification to create new features. The classification accuracies obtained for these methods were 84.5 and 86.0 per cent, respectively, with a specificity over 97 per cent and a sensitivity between 62 and 66 per cent. This article presents a practical implementation of traditional data mining techniques that can be used to help clinicians in decision-making; moreover, principal component analysis is used as an algorithm for feature reduction.

Keywords: cardiology; clinical decision-making; data mining; linear discriminant analysis; principal component analysis.

MeSH terms

  • Algorithms
  • Coronary Artery Disease* / diagnosis
  • Discriminant Analysis
  • Europe
  • Humans
  • Principal Component Analysis