Machine Learning Models for Pancreatic Cancer Risk Prediction Using Electronic Health Record Data - A Systematic Review and Assessment

Anup Kumar Mishra; Bradford Chong; Shivaram P Arunachalam; Ann L Oberg; Shounak Majumder

doi:10.14309/ajg.0000000000002870

Machine Learning Models for Pancreatic Cancer Risk Prediction Using Electronic Health Record Data - A Systematic Review and Assessment

Am J Gastroenterol. 2024 May 16. doi: 10.14309/ajg.0000000000002870. Online ahead of print.

Authors

Anup Kumar Mishra¹, Bradford Chong¹, Shivaram P Arunachalam¹, Ann L Oberg², Shounak Majumder¹

Affiliations

¹ Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA.
² Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA.

PMID: 38752654
DOI: 10.14309/ajg.0000000000002870

Abstract

Introduction: Accurate risk prediction can facilitate screening and early detection of pancreatic cancer (PC). We conducted a systematic review to critically evaluate effectiveness of machine learning (ML) and artificial intelligence (AI) techniques applied to Electronic Health Records (EHR) for PC risk prediction.

Methods: Ovid MEDLINE(R), Ovid EMBASE, Ovid Cochrane Central Register of Controlled Trials, Ovid Cochrane Database of Systematic Reviews, Scopus, and Web of Science were searched for articles that utilized ML/AI techniques to predict PC, published between January 1st, 2012 to February 1st, 2024. Study selection and data extraction were conducted by two independent reviewers. Critical appraisal and data extraction was performed using CHARMS checklist. Risk of bias and applicability was examined using PROBAST.

Results: Thirty studies including 169,149 PC cases were identified. Logistic regression was the most frequent modeling method. Twenty studies utilized a curated set of known PC risk predictors or those identified by clinical experts. ML model discrimination performance (C-index) ranged from 0.57 to 1.0. Missing data was underreported, and most studies did not implement explainable-AI techniques or report exclusion time intervals.

Discussion: AI/ML models for PC risk prediction using known risk factors perform reasonably well and may have near-term applications in identifying cohorts for targeted PC screening if validated in real-world data sets. The combined use of structured and unstructured EHR data using emerging language models while incorporating explainable-AI techniques has the potential to identify novel PC risk factors and this approach merits further study.

Abstract

Grants and funding