Electronic Medical Records as Input to Predict Postoperative Immediate Remission of Cushing's Disease: Application of Word Embedding

Front Oncol. 2021 Oct 13:11:754882. doi: 10.3389/fonc.2021.754882. eCollection 2021.

Abstract

Background: No existing machine learning (ML)-based models use free text from electronic medical records (EMR) as input to predict immediate remission (IR) of Cushing's disease (CD) after transsphenoidal surgery.

Purpose: The aim of the present study is to develop an ML-based model that uses EMR that include both structured features and free text as input to preoperatively predict IR after transsphenoidal surgery.

Methods: A total of 419 patients with CD from Peking Union Medical College Hospital were enrolled between January 2014 and August 2020. The EMR of the patients were embedded and transformed into low-dimensional dense vectors that can be included in four ML-based models together with structured features. The area under the curve (AUC) of receiver operating characteristic curves was used to evaluate the performance of the models.

Results: The overall remission rate of the 419 patients was 75.7%. From the results of logistic multivariate analysis, operation (p < 0.001), invasion of cavernous sinus from MRI (p = 0.046), and ACTH (p = 0.024) were strongly correlated with IR. The AUC values for the four ML-based models ranged from 0.686 to 0.793. The highest AUC value (0.793) was for logistic regression when 11 structured features and "individual conclusions of the case by doctor" were included.

Conclusion: An ML-based model was developed using both structured and unstructured features (after being processed using a word embedding method) as input to preoperatively predict postoperative IR.

Keywords: Cushing’s disease; immediate remission; machine learning; natural language processing; preoperative prediction.