Development and Validation of Novel Deep-Learning Models Using Multiple Data Types for Lung Cancer Survival

Jason C Hsu; Phung-Anh Nguyen; Phan Thanh Phuc; Tsai-Chih Lo; Min-Huei Hsu; Min-Shu Hsieh; Nguyen Quoc Khanh Le; Chi-Tsun Cheng; Tzu-Hao Chang; Cheng-Yu Chen

doi:10.3390/cancers14225562

Development and Validation of Novel Deep-Learning Models Using Multiple Data Types for Lung Cancer Survival

Cancers (Basel). 2022 Nov 12;14(22):5562. doi: 10.3390/cancers14225562.

Authors

Jason C Hsu^{1

2

3

4}, Phung-Anh Nguyen^{1

2

3}, Phan Thanh Phuc⁴, Tsai-Chih Lo⁵, Min-Huei Hsu^{6

7}, Min-Shu Hsieh^{8

9}, Nguyen Quoc Khanh Le^{10

11}, Chi-Tsun Cheng³, Tzu-Hao Chang^{2

5}, Cheng-Yu Chen^{11

12}

Affiliations

¹ Clinical Data Center, Office of Data Science, Taipei Medical University, Taipei 110, Taiwan.
² Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei Medical University, Taipei 110, Taiwan.
³ Research Center of Health Care Industry Data Science, College of Management, Taipei Medical University, Taipei 110, Taiwan.
⁴ International Ph.D. Program in Biotech and Healthcare Management, College of Management, Taipei Medical University, Taipei 110, Taiwan.
⁵ Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, 250 Wu-Hsing Str., Xinyi Dist., Taipei 110, Taiwan.
⁶ Office of Data Science, Taipei Medical University, Taipei 110, Taiwan.
⁷ Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei 110, Taiwan.
⁸ Department of Pathology, National Taiwan University Hospital, Taipei 100, Taiwan.
⁹ Graduate Institute of Pathology, College of Medicine, National Taiwan University, Taipei 100, Taiwan.
¹⁰ Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan.
¹¹ Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 110, Taiwan.
¹² Department of Radiology, College of Medicine, Taipei Medical University, 250 Wu-Hsing Str., Xinyi Dist., Taipei 110, Taiwan.

Abstract

A well-established lung-cancer-survival-prediction model that relies on multiple data types, multiple novel machine-learning algorithms, and external testing is absent in the literature. This study aims to address this gap and determine the critical factors of lung cancer survival. We selected non-small-cell lung cancer patients from a retrospective dataset of the Taipei Medical University Clinical Research Database and Taiwan Cancer Registry between January 2008 and December 2018. All patients were monitored from the index date of cancer diagnosis until the event of death. Variables, including demographics, comorbidities, medications, laboratories, and patient gene tests, were used. Nine machine-learning algorithms with various modes were used. The performance of the algorithms was measured by the area under the receiver operating characteristic curve (AUC). In total, 3714 patients were included. The best performance of the artificial neural network (ANN) model was achieved when integrating all variables with the AUC, accuracy, precision, recall, and F1-score of 0.89, 0.82, 0.91, 0.75, and 0.65, respectively. The most important features were cancer stage, cancer size, age of diagnosis, smoking, drinking status, EGFR gene, and body mass index. Overall, the ANN model improved predictive performance when integrating different data types.

Keywords: artificial intelligence; lung cancer; machine learning; prediction models; real-world data; survival.

Grants and funding

This study was supported by Taiwan Ministry of Science and Technology grants (grant numbers: MOST109-2321-B-038-004; MOST110-2321-B-038-004). The funders had no role in the study design, data collection and analysis, publication decision, or manuscript preparation.