Curation of the Pancreatic Ductal Adenocarcinoma Subset of the Cancer Genome Atlas Is Essential for Accurate Conclusions about Survival-Related Molecular Mechanisms

Clin Cancer Res. 2018 Aug 15;24(16):3813-3819. doi: 10.1158/1078-0432.CCR-18-0290. Epub 2018 May 8.

Abstract

Purpose: Publicly available databases, for example, The Cancer Genome Atlas (TCGA), containing clinical and molecular data from many patients are useful in validating the contribution of particular genes to disease mechanisms and in forming novel hypotheses relating to clinical outcomes.Experimental Design: The impact of key drivers of cancer progression can be assessed by segregating a patient cohort by certain molecular features and constructing survival plots using the associated clinical data. However, conclusions drawn from this straightforward analysis are highly dependent on the quality and source of tissue samples, as demonstrated through the pancreatic ductal adenocarcinoma (PDAC) subset of TCGA.Results: Analyses of the PDAC-TCGA database, which contains mainly resectable cancer samples from patients in stage IIB, reveal a difference from widely known historic median and 5-year survival rates of PDAC. A similar discrepancy was observed in lung, stomach, and liver cancer subsets of TCGA. The whole transcriptome expression patterns of PDAC-TCGA revealed a cluster of samples derived from neuroendocrine tumors, which have a distinctive biology and better disease prognosis than PDAC. Furthermore, PDAC-TCGA contains numerous pseudo-normal samples, as well as those that arose from tumors not classified as PDAC.Conclusions: Inclusion of misclassified samples in the bioinformatic analyses distorts the association of molecular biomarkers with clinical outcomes, altering multiple published conclusions used to support and motivate experimental research. Hence, the stringent scrutiny of type and origin of samples included in the bioinformatic analyses by researchers, databases, and web-tool developers is of crucial importance for generating accurate conclusions. Clin Cancer Res; 24(16); 3813-9. ©2018 AACR.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenocarcinoma / classification
  • Adenocarcinoma / genetics*
  • Adenocarcinoma / pathology
  • Biomarkers, Tumor / genetics*
  • Carcinoma, Pancreatic Ductal / classification
  • Carcinoma, Pancreatic Ductal / genetics*
  • Carcinoma, Pancreatic Ductal / pathology
  • Computational Biology
  • Disease-Free Survival
  • Female
  • Gene Expression Regulation, Neoplastic / genetics
  • Genome, Human / genetics
  • Humans
  • Kaplan-Meier Estimate
  • Male
  • Neuroendocrine Tumors / genetics
  • Neuroendocrine Tumors / pathology
  • Prognosis
  • SEER Program
  • Transcriptome / genetics*
  • Translational Research, Biomedical

Substances

  • Biomarkers, Tumor