Risk classification of cancer survival using ANN with gene expression data from multiple laboratories

Comput Biol Med. 2014 May:48:1-7. doi: 10.1016/j.compbiomed.2014.02.006. Epub 2014 Feb 22.

Abstract

Numerous cancer studies have combined gene expression experiments and clinical survival data to predict the prognosis of patients of specific gene types. However, most results of these studies were data dependent and were not suitable for other data sets. This study performed cross-laboratory validations for the cancer patient data from 4 hospitals. We investigated the feasibility of survival risk predictions using high-throughput gene expression data and clinical data. We analyzed multiple data sets for prognostic applications in lung cancer diagnosis. After building tens of thousands of various ANN architectures using the training data, five survival-time correlated genes were identified from 4 microarray gene expression data sets by examining the correlation between gene signatures and patient survival time. The experimental results showed that gene expression data can be used for valid predictions of cancer patient survival classification with an overall accuracy of 83.0% based on survival time trusted data. The results show the prediction model yielded excellent predictions given that patients in the high-risk group obtained a lower median overall survival compared with low-risk patients (log-rank test P-value<0.00001). This study provides a foundation for further clinical studies and research into other types of cancer. We hope these findings will improve the prognostic methods of cancer patients.

Keywords: Gene expression; Lung cancer; Machine learning; Microarray; Neural network; Outcome prediction; Survival analysis.

MeSH terms

  • Computational Biology / methods
  • Female
  • Gene Expression Profiling / methods*
  • Humans
  • Kaplan-Meier Estimate
  • Lung Neoplasms / diagnosis*
  • Lung Neoplasms / mortality*
  • Male
  • Models, Statistical*
  • Neural Networks, Computer*
  • Prognosis
  • Risk