Prediction of mortality in Intensive Care Units: a multivariate feature selection

J Biomed Inform. 2020 Jul:107:103456. doi: 10.1016/j.jbi.2020.103456. Epub 2020 May 23.

Abstract

Context: The critical nature of patients in Intensive Care Units (ICUs) demands intensive monitoring of their vital signs as well as highly qualified professional assistance. The combination of these needs makes ICUs very expensive, which requires investment to be prioritized. Administrative issues emerge, and health institutions face dilemmas such as: "How many beds should an ICU provide to serve the population, at the lowest costs" and "Which is the most critical body information to monitor in an ICU?". Due to financial and ethical implications, these judgments require technical and precise knowledge. Decisions have usually relied on clinical scores, like the APACHE (Acute Physiology And Chronic Health Evaluation) and SOFA (Sequential Organ Failure Assessment) scores, which are imprecise and outdated. The popularization of machine learning techniques has shed some light on the topic as a way to renew score purposes. In 2012, the PhysioNet/Computing in Cardiology launched the Challenge - ICU Patients. This Challenge aimed to stimulate the development of techniques to predict mortality in ICUs. Based on biometric and physiological features collected from patients, the participants predicted the patient's death risk by using their classifiers. Several participants achieved results that were better than the results produced by the SOFA and the APACHE scores; the prediction levels were ≈54%, which is weak.

Objectives: Here, we investigate the reasons that led to these results as a means to ground our solution. Then, we propose alternative practices in an attempt to improve the results. Our main goal is to improve the prediction of mortality in ICUs by using the same data employed during the 2012 PhysioNet Challenge. Our specific objectives are (i) to simplify the problem by reducing the dimensionality; (ii) to reduce the uncontrolled variance, and (iii) to make classifiers less dependent on the training set.

Methods: Accordingly, we propose a methodology based on extensive steps, including sample filter and data normalization. To select features and to reduce the intra-group variance, we employ multivariate data analysis by using Principal Component Analysis, Factor Analysis, Spectral Clustering, and Tukey's HSD Test, recursively. After that, we use machine learning techniques to create classifiers according to different methods. We evaluate our results with the same metrics proposed by the 2012 PhysioNet Challenge.

Results: For classifiers constructed and tested by using independent datasets, our best classifier was a linear SVM, which provided results of ≈0.73. These results were significantly better than the ≈0.54 achieved in previous work at >99% confidence interval. Furthermore, our approach only demanded twelve features, which was consistently smaller than the number of features required by the previous approaches.

Conclusion: Our results indicated that our approach presented: (a) higher performance to predict death risks (+20%); (b) smaller dependence on the training set; and (c) lower costs for ICU monitoring (few features). Besides the better prediction power, our approach also demanded lower costs for implementation and a more extensive range of potential ICUs. Future studies should employ our proposal to investigate the possibility of including some physiological features that were not available for the 2012 PhysioNet Challenge.

Keywords: Clinical Decision Support; Critical care; Intensive Care Units; Machine learning; Mortality prediction; Multivariate Data Analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • APACHE
  • Hospital Mortality
  • Humans
  • Intensive Care Units*
  • Machine Learning*
  • Vital Signs