Breast cancer prognosis risk estimation using integrated gene expression and clinical data

Biomed Res Int. 2014:2014:459203. doi: 10.1155/2014/459203. Epub 2014 May 14.

Abstract

Background: Novel prognostic markers are needed so newly diagnosed breast cancer patients do not undergo any unnecessary therapy. Various microarray gene expression datasets based studies have generated gene signatures to predict the prognosis outcomes, while ignoring the large amount of information contained in established clinical markers. Nevertheless, small sample sizes in individual microarray datasets remain a bottleneck in generating robust gene signatures that show limited predictive power. The aim of this study is to achieve high classification accuracy for the good prognosis group and then achieve high classification accuracy for the poor prognosis group.

Methods: We propose a novel algorithm called the IPRE (integrated prognosis risk estimation) algorithm. We used integrated microarray datasets from multiple studies to increase the sample sizes (∼ 2,700 samples). The IPRE algorithm consists of a virtual chromosome for the extraction of the prognostic gene signature that has 79 genes, and a multivariate logistic regression model that incorporates clinical data along with expression data to generate the risk score formula that accurately categorizes breast cancer patients into two prognosis groups.

Results: The evaluation on two testing datasets showed that the IPRE algorithm achieved high classification accuracies of 82% and 87%, which was far greater than any existing algorithms.

MeSH terms

  • Algorithms
  • Biomarkers, Tumor / genetics
  • Breast Neoplasms / diagnosis
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / pathology
  • Databases, Genetic
  • Female
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Logistic Models
  • Microarray Analysis*
  • Prognosis*
  • Risk Factors

Substances

  • Biomarkers, Tumor