Using microarray-based subtyping methods for breast cancer in the era of high-throughput RNA sequencing

Mol Oncol. 2018 Dec;12(12):2136-2146. doi: 10.1002/1878-0261.12389. Epub 2018 Oct 29.

Abstract

Breast cancer is a highly heterogeneous disease that can be classified into multiple subtypes based on the tumor transcriptome. Most of the subtyping schemes used in clinics today are derived from analyses of microarray data from thousands of different tumors together with clinical data for the patients from which the tumors were isolated. However, RNA sequencing (RNA-Seq) is gradually replacing microarrays as the preferred transcriptomics platform, and although transcript abundances measured by the two different technologies are largely compatible, subtyping methods developed for probe-based microarray data are incompatible with RNA-Seq as input data. Here, we present an RNA-Seq data processing pipeline, which relies on the mapping of sequencing reads to the probe set target sequences instead of the human reference genome, thereby enabling probe-based subtyping of breast cancer tumor tissue using sequencing-based transcriptomics. By analyzing 66 breast cancer tumors for which gene expression was measured using both microarrays and RNA-Seq, we show that RNA-Seq data can be directly compared to microarray data using our pipeline. Additionally, we demonstrate that the established subtyping method CITBCMST (Guedj et al., ), which relies on a 375 probe set-signature to classify samples into the six subtypes basL, lumA, lumB, lumC, mApo, and normL, can be applied without further modifications. This pipeline enables a seamless transition to sequencing-based transcriptomics for future clinical purposes.

Keywords: RNA sequencing; breast cancer; gene expression; molecular subtyping.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms / genetics*
  • Female
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Sequence Analysis, RNA / methods
  • Transcriptome*

Associated data

  • GENBANK/GSE43358