BreastMark: an integrated approach to mining publicly available transcriptomic datasets relating to breast cancer outcome

Breast Cancer Res. 2013;15(4):R52. doi: 10.1186/bcr3444.

Abstract

Introduction: Breast cancer is a complex heterogeneous disease for which a substantial resource of transcriptomic data is available. Gene expression data have facilitated the division of breast cancer into, at least, five molecular subtypes, namely luminal A, luminal B, HER2, normal-like and basal. Once identified, breast cancer subtypes can inform clinical decisions surrounding patient treatment and prognosis. Indeed, it is important to identify patients at risk of developing aggressive disease so as to tailor the level of clinical intervention.

Methods: We have developed a user-friendly, web-based system to allow the evaluation of genes/microRNAs (miRNAs) that are significantly associated with survival in breast cancer and its molecular subtypes. The algorithm combines gene expression data from multiple microarray experiments which frequently also contain miRNA expression information, and detailed clinical data to correlate outcome with gene/miRNA expression levels. This algorithm integrates gene expression and survival data from 26 datasets on 12 different microarray platforms corresponding to approximately 17,000 genes in up to 4,738 samples. In addition, the prognostic potential of 341 miRNAs can be analysed.

Results: We demonstrated the robustness of our approach in comparison to two commercially available prognostic tests, oncotype DX and MammaPrint. Our algorithm complements these prognostic tests and is consistent with their findings. In addition, BreastMark can act as a powerful reductionist approach to these more complex gene signatures, eliminating superfluous genes, potentially reducing the cost and complexity of these multi-index assays. Known miRNA prognostic markers, mir-205 and mir-93, were used to confirm the prognostic value of this tool in a miRNA setting. We also applied the algorithm to examine expression of 58 receptor tyrosine kinases in the basal-like subtype, identifying six receptor tyrosine kinases associated with poor disease-free survival and/or overall survival (EPHA5, FGFR1, FGFR3, VEGFR1, PDGFRβ, and TIE1). A web application for using this algorithm is currently available.

Conclusions: BreastMark is a powerful tool for examining putative gene/miRNA prognostic markers in breast cancer. The value of this tool will be in the preliminary assessment of putative biomarkers in breast cancer. It will be of particular use to research groups with limited bioinformatics facilities.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Breast Neoplasms / diagnosis
  • Breast Neoplasms / drug therapy
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / mortality
  • Computational Biology / methods
  • Data Mining / methods*
  • Databases, Nucleic Acid
  • Female
  • Gene Expression Profiling*
  • Humans
  • MicroRNAs / genetics
  • Middle Aged
  • Prognosis
  • Receptor Protein-Tyrosine Kinases / genetics
  • Reproducibility of Results
  • Software*
  • Transcriptome*
  • Web Browser

Substances

  • MicroRNAs
  • Receptor Protein-Tyrosine Kinases