Developing and Validating a Lung Cancer Risk Prediction Model: A Nationwide Population-Based Study

Cancers (Basel). 2023 Jan 12;15(2):487. doi: 10.3390/cancers15020487.

Abstract

Lung cancer can be challenging to diagnose in the early stages, where treatment options are optimal. We aimed to develop 1-year prediction models for the individual risk of incident lung cancer for all individuals aged 40 or above living in Denmark on 1 January 2017. The study was conducted using population-based registers on health and sociodemographics from 2007-2016. We applied backward selection on all variables by logistic regression to develop a risk model for lung cancer and applied the models to the validation cohort, calculated receiver-operating characteristic curves, and estimated the corresponding areas under the curve (AUC). In the populations without and with previously confirmed cancer, 4274/2,826,249 (0.15%) and 482/172,513 (0.3%) individuals received a lung cancer diagnosis in 2017, respectively. For both populations, older age was a relevant predictor, and the most complex models, containing variables related to diagnoses, medication, general practitioner, and specialist contacts, as well as baseline sociodemographic characteristics, had the highest AUC. These models achieved a positive predictive value (PPV) of 0.0127 (0.006) and a negative predictive value (NPV) of 0.989 (0.997) with a 1% cut-off in the population without (with) previous cancer. This corresponds to 1.2% of the screened population experiencing a positive prediction, of which 1.3% would be incident with lung cancer. We have developed and tested a prediction model with a reasonable potential to support clinicians and healthcare planners in identifying patients at risk of lung cancer.

Keywords: automated risk calculation; cancer diagnosis; lung cancer; prediction models; register data; socioeconomic status.