Serum biomarker-based early detection of pancreatic ductal adenocarcinomas with ensemble learning

Commun Med (Lond). 2023 Jan 20;3(1):10. doi: 10.1038/s43856-023-00237-5.

Abstract

Background: Earlier detection of pancreatic ductal adenocarcinoma (PDAC) is key to improving patient outcomes, as it is mostly detected at advanced stages which are associated with poor survival. Developing non-invasive blood tests for early detection would be an important breakthrough.

Methods: The primary objective of the work presented here is to use a dataset that is prospectively collected, to quantify a set of cancer-associated proteins and construct multi-marker models with the capacity to predict PDAC years before diagnosis. The data used is part of a nested case-control study within the UK Collaborative Trial of Ovarian Cancer Screening and is comprised of 218 samples, collected from a total of 143 post-menopausal women who were diagnosed with pancreatic cancer within 70 months after sample collection, and 249 matched non-cancer controls. We develop a stacked ensemble modelling technique to achieve robustness in predictions and, therefore, improve performance in newly collected datasets.

Results: Here we show that with ensemble learning we can predict PDAC status with an AUC of 0.91 (95% CI 0.75-1.0), sensitivity of 92% (95% CI 0.54-1.0) at 90% specificity, up to 1 year prior to diagnosis, and at an AUC of 0.85 (95% CI 0.74-0.93) up to 2 years prior to diagnosis (sensitivity of 61%, 95% CI 0.17-0.83, at 90% specificity).

Conclusions: The ensemble modelling strategy explored here outperforms considerably biomarker combinations cited in the literature. Further developments in the selection of classifiers balancing performance and heterogeneity should further enhance the predictive capacity of the method.

Plain language summary

Pancreatic cancers are most frequently detected at an advanced stage. This limits treatment options and contributes to the dismal survival rates currently recorded. The development of new tests that could improve detection of early-stage disease is fundamental to improve outcomes. Here, we use advanced data analysis techniques to devise an early detection test for pancreatic cancer. We use data on markers in the blood from people enrolled on a screening trial. Our test correctly identifies as positive for pancreatic cancer 91% of the time up to 1 year prior to diagnosis, and 78% of the time up to 2 years prior to diagnosis. These results surpass previously reported tests and should encourage further evaluation of the test in different populations, to see whether it should be adopted in the clinic.