MetBERT: a generalizable and pre-trained deep learning model for the prediction of metastatic cancer from clinical notes

AMIA Jt Summits Transl Sci Proc. 2022 May 23:2022:331-338. eCollection 2022.

Abstract

Distant metastasis is the major cause of cancer-related deaths; however, early diagnosis of cancer metastasis remains a significant challenge. The recent advances in pre-trained natural language processing models coupled with the accumulation of publicly available Electronic Health Records (EHR) data provide an unprecedented opportunity to computationally tackle the challenge. Here, we fine-tuned multiple state-of-the-art BERT-based models using discharge summaries from the open MIMIC-III dataset and derived MetBERT, a novel model tailored to predict cancer metastasis from clinical notes. MetBERT achieved high performance (AUC=0.94) on our in-house validation dataset, suggesting its high generalizability. In addition, MetBERT enabled determining the date of cancer metastasis using the rich information in clinical notes and therefore could be potentially deployed as a tool for early diagnosis. Finally, we interpreted MetBERT at different scales and revealed a possible association between radiation therapy and metastasis risk in multiple cancer types.