A Machine-Learning-Based Prediction Method for Hypertension Outcomes Based on Medical Data

Diagnostics (Basel). 2019 Nov 7;9(4):178. doi: 10.3390/diagnostics9040178.

Abstract

The outcomes of hypertension refer to the death or serious complications (such as myocardial infarction or stroke) that may occur in patients with hypertension. The outcomes of hypertension are very concerning for patients and doctors, and are ideally avoided. However, there is no satisfactory method for predicting the outcomes of hypertension. Therefore, this paper proposes a prediction method for outcomes based on physical examination indicators of hypertension patients. In this work, we divide the patients' outcome prediction into two steps. The first step is to extract the key features from the patients' many physical examination indicators. The second step is to use the key features extracted from the first step to predict the patients' outcomes. To this end, we propose a model combining recursive feature elimination with a cross-validation method and classification algorithm. In the first step, we use the recursive feature elimination algorithm to rank the importance of all features, and then extract the optimal features subset using cross-validation. In the second step, we use four classification algorithms (support vector machine (SVM), C4.5 decision tree, random forest (RF), and extreme gradient boosting (XGBoost)) to accurately predict patient outcomes by using their optimal features subset. The selected model prediction performance evaluation metrics are accuracy, F1 measure, and area under receiver operating characteristic curve. The 10-fold cross-validation shows that C4.5, RF, and XGBoost can achieve very good prediction results with a small number of features, and the classifier after recursive feature elimination with cross-validation feature selection has better prediction performance. Among the four classifiers, XGBoost has the best prediction performance, and its accuracy, F1, and area under receiver operating characteristic curve (AUC) values are 94.36%, 0.875, and 0.927, respectively, using the optimal features subset. This article's prediction of hypertension outcomes contributes to the in-depth study of hypertension complications and has strong practical significance.

Keywords: XGBoost; classification algorithm; feature selection; hypertension outcomes; prediction; recursive feature elimination.