A multi-classification model for non-small cell lung cancer subtypes based on independent subtask learning

Med Phys. 2022 Nov;49(11):6960-6974. doi: 10.1002/mp.15808. Epub 2022 Jun 26.

Abstract

Purpose: The non-small cell lung cancer (NSCLC) can be divided into adenocarcinoma (ADC), squamous cell carcinoma (SCC), large cell carcinoma (LCC), and not otherwise specified (NOS), which is crucial for clinical decision-making. However, current related researches are rare for the complex multi-classification of NSCLC, mainly due to the serious data imbalance, the difficulty to unify the feature space, and the complicated decision boundary among multiple subtypes. The machine learning method of traditional "one-vs-one" (OVO) is difficult to solve these problems and achieve good results.

Methods: To this end, we propose a novel independent subtask learning (ISTL) method to better carry out the multi-classification task. Specifically, it includes four pertinent strategies: (1) independent data expansion; (2) independent feature selection (IFS); (3) independent model construction; and (4) a novel voting strategy: majority voting combined with Bayesian prior. We performed experiments using 1036 CT scans (ADC:SCC:LCC:NOS = 600:268:105:63) collected from eight international databases, and the data acquisition was highly complex and diverse.

Results: The experimental results showed that the ISTL method obtained an accuracy of 0.812 on the independent test cohort, which significantly improved the performance of multi-classification compared with the traditional OVO-support vector machine (0.691) and OVO-random forest (0.710) models. After the IFS, six selected feature sets of six binary tasks are obviously different, indicating that the ISTL method has better interpretability to distinguish the multiple NSCLC subtypes. The results of a further auxiliary contrast experiment showed that four pertinent strategies were all effective.

Conclusion: Our work indicates that the ISTL method can effectively perform multi-classification of NSCLC subtypes with better interpretability for clinical computer-aided detection and has the potential to be applied in a wide range of multi-classification studies.

Keywords: machine learning; multi-classification; non-small cell lung cancer; one-vs-one (OVO); radiomics.

MeSH terms

  • Bayes Theorem
  • Carcinoma, Non-Small-Cell Lung* / diagnostic imaging
  • Humans
  • Lung Neoplasms* / diagnostic imaging