Prediction of Preeclampsia from Clinical and Genetic Risk Factors in Early and Late Pregnancy Using Machine Learning and Polygenic Risk Scores

medRxiv [Preprint]. 2023 Feb 7:2023.02.03.23285385. doi: 10.1101/2023.02.03.23285385.

Abstract

Background: Preeclampsia, a pregnancy-specific condition associated with new-onset hypertension after 20 weeks gestation, is a leading cause of maternal and neonatal morbidity and mortality. Predictive tools to understand which individuals are most at risk are needed.

Methods: We identified a cohort of N=1,125 pregnant individuals who delivered between 05/2015-05/2022 at Mass General Brigham hospitals with available electronic health record (EHR) data and linked genetic data. Using clinical EHR data and systolic blood pressure polygenic risk scores (SBP PRS) derived from a large genome-wide association study, we developed machine learning (xgboost) and linear regression models to predict preeclampsia risk.

Results: Pregnant individuals with an SBP PRS in the top quartile had higher blood pressures throughout pregnancy compared to patients within the lowest quartile SBP PRS. In the first trimester, the most predictive model was xgboost, with an area under the curve (AUC) of 0.73. Adding the SBP PRS to the models improved the performance only of the linear regression model from AUC 0.70 to 0.71; the predictive power of other models remained unchanged. In late pregnancy, with data obtained up to the delivery admission, the best performing model was xgboost using clinical variables, which achieved an AUC of 0.91.

Conclusions: Integrating clinical and genetic factors into predictive models can inform personalized preeclampsia risk and achieve higher predictive power than the current practice. In the future, personalized tools can be implemented in clinical practice to identify high-risk patients for preventative therapies and timely intervention to improve adverse maternal and neonatal outcomes.

Publication types

  • Preprint