Building a predictive model for hypertension related to environmental chemicals using machine learning

Environ Sci Pollut Res Int. 2024 Jan;31(3):4595-4605. doi: 10.1007/s11356-023-31384-w. Epub 2023 Dec 17.

Abstract

Hypertension is a chronic cardiovascular disease characterized by elevated blood pressure that can lead to a number of complications. There is evidence that the numerous environmental substances to which humans are exposed facilitate the emergence of diseases. In this work, we sought to investigate the relationship between exposure to environmental contaminants and hypertension as well as the predictive value of such exposures. The National Health and Nutrition Survey (NHANES) provided us with the information we needed (2005-2012). A total of 4492 participants were included in our study, and we incorporated more common environmental chemicals and covariates by feature selection followed by regularized network analysis. Then, we applied various machine learning (ML) methods, such as extreme gradient boosting (XGBoost), random forest classifier (RF), logistic regression (LR), multilayer perceptron (MLP), and support vector machine (SVM), to predict hypertension by chemical exposure. Finally, SHapley Additive exPlanations (SHAP) were further applied to interpret the features. After the initial feature screening, we included a total of 29 variables (including 21 chemicals) for ML. The areas under the curve (AUCs) of the five ML models XGBoost, RF, LR, MLP, and SVM were 0.729, 0.723, 0.721, 0.730, and 0.731, respectively. Butylparaben (BUP), propylparaben (PPB), and 9-hydroxyfluorene (P17) were the three factors in the prediction model with the highest SHAP values. Comparing five ML models, we found that environmental exposure may play an important role in hypertension. The assessment of important chemical exposure parameters lays the groundwork for more targeted therapies, and the optimized ML models are likely to predict hypertension.

Keywords: Environmental exposures; Hypertension; Machine learning; Prediction.

MeSH terms

  • Area Under Curve
  • Cardiovascular Diseases*
  • Humans
  • Hypertension* / epidemiology
  • Machine Learning
  • Nutrition Surveys