A CBR framework with gradient boosting based feature selection for lung cancer subtype classification

Comput Biol Med. 2017 Jul 1:86:98-106. doi: 10.1016/j.compbiomed.2017.05.010. Epub 2017 May 13.

Abstract

Molecular subtype classification represents a challenging field in lung cancer diagnosis. Although different methods have been proposed for biomarker selection, efficient discrimination between adenocarcinoma and squamous cell carcinoma in clinical practice presents several difficulties, especially when the latter is poorly differentiated. This is an area of growing importance, since certain treatments and other medical decisions are based on molecular and histological features. An urgent need exists for a system and a set of biomarkers that provide an accurate diagnosis. In this paper, a novel Case Based Reasoning framework with gradient boosting based feature selection is proposed and applied to the task of squamous cell carcinoma and adenocarcinoma discrimination, aiming to provide accurate diagnosis with a reduced set of genes. The proposed method was trained and evaluated on two independent datasets to validate its generalization capability. Furthermore, it achieved accuracy rates greater than those of traditional microarray analysis techniques, incorporating the advantages inherent to the Case Based Reasoning methodology (e.g. learning over time, adaptability, interpretability of solutions, etc.).

Keywords: Biomarker; Case-based reasoning; Gradient boosting; Microarray; NSCLC.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenocarcinoma* / classification
  • Adenocarcinoma* / genetics
  • Adenocarcinoma* / metabolism
  • Carcinoma, Squamous Cell* / classification
  • Carcinoma, Squamous Cell* / genetics
  • Carcinoma, Squamous Cell* / metabolism
  • Databases, Genetic*
  • Gene Expression Regulation, Neoplastic*
  • Genes, Neoplasm*
  • Humans
  • Lung Neoplasms* / classification
  • Lung Neoplasms* / genetics
  • Lung Neoplasms* / metabolism