Integrating machine learning with electronic health record data to facilitate detection of prolactin level and pharmacovigilance signals in olanzapine-treated patients

Front Endocrinol (Lausanne). 2022 Oct 13:13:1011492. doi: 10.3389/fendo.2022.1011492. eCollection 2022.

Abstract

Background and aim: Available evidence suggests elevated serum prolactin (PRL) levels in olanzapine (OLZ)-treated patients with schizophrenia. However, machine learning (ML)-based comprehensive evaluations of the influence of pathophysiological and pharmacological factors on PRL levels in OLZ-treated patients are rare. We aimed to forecast the PRL level in OLZ-treated patients and mine pharmacovigilance information on PRL-related adverse events by integrating ML and electronic health record (EHR) data.

Methods: Data were extracted from an EHR system to construct an ML dataset in 672×384 matrix format after preprocessing, which was subsequently randomly divided into a derivation cohort for model development and a validation cohort for model validation (8:2). The eXtreme gradient boosting (XGBoost) algorithm was used to build the ML models, the importance of the features and predictive behaviors of which were illustrated by SHapley Additive exPlanations (SHAP)-based analyses. The sequential forward feature selection approach was used to generate the optimal feature subset. The co-administered drugs that might have influenced PRL levels during OLZ treatment as identified by SHAP analyses were then compared with evidence from disproportionality analyses by using OpenVigil FDA.

Results: The 15 features that made the greatest contributions, as ranked by the mean (|SHAP value|), were identified as the optimal feature subset. The features were gender_male, co-administration of risperidone, age, co-administration of aripiprazole, concentration of aripiprazole, concentration of OLZ, progesterone, co-administration of sulpiride, creatine kinase, serum sodium, serum phosphorus, testosterone, platelet distribution width, α-L-fucosidase, and lipoprotein (a). The XGBoost model after feature selection delivered good performance on the validation cohort with a mean absolute error of 0.046, mean squared error of 0.0036, root-mean-squared error of 0.060, and mean relative error of 11%. Risperidone and aripiprazole exhibited the strongest associations with hyperprolactinemia and decreased blood PRL according to the disproportionality analyses, and both were identified as co-administered drugs that influenced PRL levels during OLZ treatment by SHAP analyses.

Conclusions: Multiple pathophysiological and pharmacological confounders influence PRL levels associated with effective treatment and PRL-related side-effects in OLZ-treated patients. Our study highlights the feasibility of integration of ML and EHR data to facilitate the detection of PRL levels and pharmacovigilance signals in OLZ-treated patients.

Keywords: SHAP; XGBoost; electronic health record; hyperprolactinemia; machine learning; olanzapine; pharmacovigilance; prolactin.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Antipsychotic Agents* / adverse effects
  • Aripiprazole
  • Benzodiazepines / adverse effects
  • Electronic Health Records
  • Humans
  • Machine Learning
  • Male
  • Olanzapine / adverse effects
  • Pharmacovigilance
  • Prolactin
  • Risperidone* / adverse effects

Substances

  • Olanzapine
  • Risperidone
  • Prolactin
  • Antipsychotic Agents
  • Aripiprazole
  • Benzodiazepines