Computer-aided diagnosis for early-stage lung cancer based on longitudinal and balanced data

PLoS One. 2013 May 15;8(5):e63559. doi: 10.1371/journal.pone.0063559. Print 2013.

Abstract

Background: Lung cancer is one of the most common forms of cancer resulting in over a million deaths per year worldwide. Typically, the problem can be approached by developing more discriminative diagnosis methods. In this paper, computer-aided diagnosis was used to facilitate the prediction of characteristics of solitary pulmonary nodules in CT of lungs to diagnose early-stage lung cancer.

Methods: The synthetic minority over-sampling technique (SMOTE) was used to account for raw data in order to balance the original training data set. Curvelet-transformation textural features, together with 3 patient demographic characteristics, and 9 morphological features were used to establish a support vector machine (SVM) prediction model. Longitudinal data as the test data set was used to evaluate the classification performance of predicting early-stage lung cancer.

Results: Using the SMOTE as a pre-processing procedure, the original training data was balanced with a ratio of malignant to benign cases of 1∶1. Accuracy based on cross-evaluation for the original unbalanced data and balanced data was 80% and 97%, respectively. Based on Curvelet-transformation textural features and other features, the SVM prediction model had good classification performance for early-stage lung cancer, with an area under the curve of the SVMs of 0.949 (P<0.001). Textural feature (standard deviation) showed benign cases had a higher change in the follow-up period than malignant cases.

Conclusions: With textural features extracted from a Curvelet transformation and other parameters, a sensitive support vector machine prediction model can increase the rate of diagnosis for early-stage lung cancer. This scheme can be used as an auxiliary tool to differentiate between benign and malignant early-stage lung cancers in CT images.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Diagnosis, Computer-Assisted*
  • Early Detection of Cancer
  • Humans
  • Longitudinal Studies
  • Lung Neoplasms / diagnosis*
  • Models, Theoretical

Grants and funding

This was supported by the Natural Science Fund of China (Serial Number: 81172772); the Natural Science Fund of Beijing (Serial Number: 4112015); and the Program of Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality (Serial Number: PHR201007112). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.