Predictors of underutilization of lung cancer screening: a machine learning approach

Eur J Cancer Prev. 2022 Nov 1;31(6):523-529. doi: 10.1097/CEJ.0000000000000742. Epub 2022 Jan 17.

Abstract

Lung cancer is the second common cancer and a leading cause of cancer-related death in the US. Unfavorably, the prevalence of using low-dose computed tomography (LDCT) for lung cancer prevention in the US has remained below 4% over time. The purpose of this study is to develop machine learning models to analyze interactive pathways of factors associated with lung cancer screening use with the LDCT. The study was based on the data retrieved from the 2018 Behavioral Risk Factor Surveillance System. After dealing with missing values, 86 variables and 710 samples were included in the decision tree model and the random forest model. The data were randomly split into training (569/710, 80%) and testing (141/710, 20%) sets. Gini impurity is used to select and determine the optimal split of the nodes in the model. Machine learning performance was evaluated by model accuracy, sensitivity, specificity, F1 score, etc. The average performance metrics of the decision tree model were obtained: average accuracy is 67.78%, F1 score is 65.76%, sensitivity is 62.52%, and specificity is 73.57% based on 100 runs. In the decision model, nine interactive pathways were identified among the following factors: average drinks per month, BMI, diabetes, first smoke age, years of smoking, year(s) quit smoking, sex, last sigmoidoscopy or colonoscopy, last dental visit, general health, insurance, education, and last Pap test. Lung cancer screening utilization is the result of the interplay of multifactors. Lung cancer screening programs in clinical settings should not only focus on patients' smoking behaviors but also consider other socioeconomic factors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Early Detection of Cancer* / methods
  • Humans
  • Lung Neoplasms* / diagnostic imaging
  • Lung Neoplasms* / epidemiology
  • Machine Learning
  • Mass Screening
  • Smoke
  • Tomography, X-Ray Computed / methods

Substances

  • Smoke