Statistical and machine learning methods for analysis of multiplex protein data from a novel proximity extension assay in patients with ST-elevation myocardial infarction

Sci Rep. 2021 Jul 2;11(1):13787. doi: 10.1038/s41598-021-93162-3.

Abstract

Using data from patients with ST-elevation myocardial infarction (STEMI), we explored how machine learning methods can be used for analysing multiplex protein data obtained from proximity extension assays. Blood samples were obtained from 48 STEMI-patients at admission and after three months. A subset of patients also had blood samples obtained at four and 12 h after admission. Multiplex protein data were obtained using a proximity extension assay. A random forest model was used to assess the predictive power and importance of biomarkers to distinguish between the acute and the stable phase. The similarity of response profiles was investigated using K-means clustering. Out of 92 proteins, 26 proteins were found to significantly distinguish the acute and the stable phase following STEMI. The five proteins tissue factor pathway inhibitor, azurocidin, spondin-1, myeloperoxidase and myoglobin were found to be highly important for differentiating between the acute and the stable phase. Four of these proteins shared response profiles over the four time-points. Machine learning methods can be used to identify and assess novel predictive biomarkers as showcased in the present study population of patients with STEMI.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Biomarkers / blood*
  • Blood Proteins / genetics*
  • Female
  • Humans
  • Machine Learning
  • Male
  • Middle Aged
  • ST Elevation Myocardial Infarction / blood*
  • ST Elevation Myocardial Infarction / diagnosis*
  • ST Elevation Myocardial Infarction / genetics
  • ST Elevation Myocardial Infarction / pathology
  • Supervised Machine Learning

Substances

  • Biomarkers
  • Blood Proteins