Identification of non-small cell lung cancer with chronic obstructive pulmonary disease using clinical symptoms and routine examination: a retrospective study

Front Oncol. 2023 Jul 28:13:1158948. doi: 10.3389/fonc.2023.1158948. eCollection 2023.

Abstract

Background: Patients with non-small cell lung cancer (NSCLC) and patients with NSCLC combined with chronic obstructive pulmonary disease (COPD) have similar physiological conditions in early stages, and the latter have shorter survival times and higher mortality rates. The purpose of this study was to develop and compare machine learning models to identify future diagnoses of COPD combined with NSCLC patients based on the patient's disease and routine clinical data.

Methods: Data were obtained from 237 patients with COPD combined with NSCLC as well as NSCLC admitted to Ningxia Hui Autonomous Region People's Hospital from October 2013 to July 2022. Six machine learning algorithms (K-nearest neighbor, logistic regression, eXtreme gradient boosting, support vector machine, naïve Bayes, and artificial neural network) were used to develop prediction models for NSCLC combined with COPD. Sensitivity, specificity, positive predictive value, negative predictive value, accuracy, F1 score, Mathews correlation coefficient (MCC), Kappa, area under the receiver operating characteristic curve (AUROC)and area under the precision-recall curve (AUPRC) were used as performance indicators to evaluate the performance of the models.

Results: 135 patients with NSCLC combined with COPD, 102 patients with NSCLC were included in the study. The results showed that pulmonary function and emphysema were important risk factors and that the support vector machine-based identification model showed optimal performance with accuracy:0.946, recall:0.940, specificity:0.955, precision:0.972, npv:0.920, F1 score:0.954, MCC:0.893, Kappa:0.888, AUROC:0.975, AUPRC:0.987.

Conclusion: The use of machine learning tools combining clinical symptoms and routine examination data features is suitable for identifying the risk of concurrent NSCLC in COPD patients.

Keywords: COPD; NSCLC; detection; emphysema; identification; machine learning; pulmonary function.

Grants and funding

This research was funded by The Natural Science Foundation of Jiangsu Province. China (Grant No. BK20201183); The “innovative and entrepreneurial talent” in Jiangsu Province (JSSCRC2021568); The “distinguished medical expert” in Jiangsu Province(JSTPYXZJ2021006); the National Natural Science Foundation of China(No. 82160017); Natural Science Foundation of Ningxia Hui Autonomous Region (No. 2022AAC03351).