Classification of individual lung cancer cell lines based on DNA methylation markers: use of linear discriminant analysis and artificial neural networks

J Mol Diagn. 2004 Feb;6(1):28-36. doi: 10.1016/S1525-1578(10)60488-6.

Abstract

The classification of small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) can pose diagnostic problems due to inter-observer variability and other limitations of histopathology. There is an interest in developing classificatory models of lung neoplasms based on the analysis of multivariate molecular data with statistical methods and/or neural networks. DNA methylation levels at 20 loci were measured in 41 SCLC and 46 NSCLC cell lines with the quantitative real-time PCR method MethyLight. The data were analyzed with artificial neural networks (ANN) and linear discriminant analysis (LDA) to classify the cell lines into SCLC or into NSCLC. Models used either data from all 20 loci, or from five significant DNA methylation loci that were selected by a step-wise back-propagation procedure (PTGS2, CALCA, MTHFR, ESR1, and CDKN2A). The data were sorted randomly by cell line into 10 different data sets, each with training and testing subsets composed of 71 and 16 of the cases, respectively. Ten ANN models were trained using the 10 data sets: five using 20 variables, and five using the five variables selected by step-wise back-propagation. The ANN models with 20 input variables correctly classified 100% of the cell lines, while the models with only five variables correctly classified 87 to 100% of cases. For comparison, 10 different LDA models were trained and tested using the same data sets with either the original data or with logarithmically transformed data. Again, half of the models used all 20 variables while the others used only the five significant variables. LDA models provided correct classifications in 62.5% to 87.5% of cases. The classifications provided by all of the different models were compared with kappa statistics, yielding kappa values ranging from 0.25 to 1.0. We conclude that ANN models based on DNA methylation profiles can objectively classify SCLC and NSCLC cells lines with substantial to perfect concordance, while LDA models based on DNA methylation profiles provide poor to substantial concordance. Our work supports the promise of ANN analysis of DNA methylation data as a powerful approach for the development of automated methods for lung cancer classification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Carcinoma, Non-Small-Cell Lung / genetics
  • Carcinoma, Non-Small-Cell Lung / pathology
  • Carcinoma, Small Cell / genetics
  • Carcinoma, Small Cell / pathology
  • Cell Line, Tumor / classification
  • Cell Line, Tumor / metabolism*
  • CpG Islands / genetics
  • DNA Methylation*
  • DNA, Neoplasm / genetics
  • DNA, Neoplasm / metabolism
  • Discriminant Analysis
  • Genetic Markers / genetics
  • Humans
  • Lung Neoplasms
  • Neural Networks, Computer
  • Polymerase Chain Reaction

Substances

  • DNA, Neoplasm
  • Genetic Markers