An Integrative Pancreatic Cancer Risk Prediction Model in the UK Biobank

Biomedicines. 2023 Dec 1;11(12):3206. doi: 10.3390/biomedicines11123206.

Abstract

Pancreatic cancer (PaCa) is a lethal cancer with an increasing incidence, highlighting the need for early prevention strategies. There is a lack of a comprehensive PaCa predictive model derived from large prospective cohorts. Therefore, we have developed an integrated PaCa risk prediction model for PaCa using data from the UK Biobank, incorporating lifestyle-related, genetic-related, and medical history-related variables for application in healthcare settings. We used a machine learning-based random forest approach and a traditional multivariable logistic regression method to develop a PaCa predictive model for different purposes. Additionally, we employed dynamic nomograms to visualize the probability of PaCa risk in the prediction model. The top five influential features in the random forest model were age, PRS, pancreatitis, DM, and smoking. The significant risk variables in the logistic regression model included male gender (OR = 1.17), age (OR = 1.10), non-O blood type (OR = 1.29), higher polygenic score (PRS) (Q5 vs. Q1, OR = 2.03), smoking (OR = 1.82), alcohol consumption (OR = 1.27), pancreatitis (OR = 3.99), diabetes (DM) (OR = 2.57), and gallbladder-related disease (OR = 2.07). The area under the receiver operating curve (AUC) of the logistic regression model is 0.78. Internal validation and calibration performed well in both models. Our integrative PaCa risk prediction model with the PRS effectively stratifies individuals at future risk of PaCa, aiding targeted prevention efforts and supporting community-based cancer prevention initiatives.

Keywords: UK Biobank cohort; nomogram; pancreatic cancer; polygenic score; random forest model; risk prediction model.