Predicting hypertension using machine learning: Findings from Qatar Biobank Study

Latifa A AlKaabi; Lina S Ahmed; Maryam F Al Attiyah; Manar E Abdel-Rahman

doi:10.1371/journal.pone.0240370

Predicting hypertension using machine learning: Findings from Qatar Biobank Study

PLoS One. 2020 Oct 16;15(10):e0240370. doi: 10.1371/journal.pone.0240370. eCollection 2020.

Authors

Latifa A AlKaabi¹, Lina S Ahmed¹, Maryam F Al Attiyah¹, Manar E Abdel-Rahman¹

Affiliation

¹ Department of Public Health, College of Health Science, QU Health, Qatar University, Doha, Qatar.

Abstract

Background and objective: Hypertension, a global burden, is associated with several risk factors and can be treated by lifestyle modifications and medications. Prediction and early diagnosis is important to prevent related health complications. The objective is to construct and compare predictive models to identify individuals at high risk of developing hypertension without the need of invasive clinical procedures.

Methods: This is a cross-sectional study using 987 records of Qataris and long-term residents aged 18+ years from Qatar Biobank. Percentages were used to summarize data and chi-square tests to assess associations. Predictive models of hypertension were constructed and compared using three supervised machine learning algorithms: decision tree, random forest, and logistics regression using 5-fold cross-validation. The performance of algorithms was assessed using accuracy, positive predictive value (PPV), sensitivity, F-measure, and area under the receiver operating characteristic curve (AUC). Stata and Weka were used for analysis.

Results: Age, gender, education level, employment, tobacco use, physical activity, adequate consumption of fruits and vegetables, abdominal obesity, history of diabetes, history of high cholesterol, and mother's history high blood pressure were important predictors of hypertension. All algorithms showed more or less similar performances: Random forest (accuracy = 82.1%, PPV = 81.4%, sensitivity = 82.1%), logistic regression (accuracy = 81.1%, PPV = 80.1%, sensitivity = 81.1%) and decision tree (accuracy = 82.1%, PPV = 81.2%, sensitivity = 82.1%. In terms of AUC, compared to logistic regression, while random forest performed similarly, decision tree had a significantly lower discrimination ability (p-value<0.05) with AUC's equal to 85.0, 86.9, and 79.9, respectively.

Conclusions: Machine learning provides the chance of having a rapid predictive model using non-invasive predictors to screen for hypertension. Future research should consider improving the predictive accuracy of models in larger general populations, including more important predictors and using a variety of algorithms.

Publication types

Comparative Study

MeSH terms

Area Under Curve
Biological Specimen Banks*
Cross-Sectional Studies
Early Diagnosis
Female
Humans
Hypertension / epidemiology*
Male
Predictive Value of Tests
Qatar / epidemiology
Supervised Machine Learning

Grants and funding

The author(s) received no specific funding for this work.