Gene expression profiling of 1200 pancreatic ductal adenocarcinoma reveals novel subtypes

BMC Cancer. 2018 May 29;18(1):603. doi: 10.1186/s12885-018-4546-8.

Abstract

Background: Pancreatic ductal adenocarcinoma (PDAC) is the fourth leading cause of cancer related death in the world with a five-year survival rate of less than 5%. Not all PDAC are the same, because there exist intra-tumoral heterogeneity between PDAC, which poses a great challenge to personalized treatments for PDAC.

Methods: To dissect the molecular heterogeneity of PDAC, we performed a retrospective meta-analysis on whole transcriptome data from more than 1200 PDAC patients. Subtypes were identified based on non-negative matrix factorization (NMF) biclustering method. We used the gene set enrichment analysis (GSEA) and survival analysis to conduct the molecular and clinical characterization of the identified subtypes, respectively.

Results: Six molecular and clinical distinct subtypes of PDAC: L1-L6, are identified and grouped into tumor-specific (L1, L2 and L6) and stroma-specific subtypes (L3, L4 and L5). For tumor-specific subtypes, L1 (~ 22%) has enriched carbohydrate metabolism-related gene sets and has intermediate survival. L2 (~ 22%) has the worst clinical outcomes, and is enriched for cell proliferation-related gene sets. About 23% patients can be classified into L6, which leads to intermediate survival and is enriched for lipid and protein metabolism-related gene sets. Stroma-specific subtypes may contain high non-epithelial contents such as collagen, immune and islet cells, respectively. For instance, L3 (~ 12%) has poor survival and is enriched for collagen-associated gene sets. L4 (~ 14%) is enriched for various immune-related gene sets and has relatively good survival. And L5 (~ 7%) has good clinical outcomes and is enriched for neurotransmitter and insulin secretion related gene sets. In the meantime, we identified 160 subtype-specific markers and built a deep learning-based classifier for PDAC. We also applied our classification system on validation datasets and observed much similar molecular and clinical characteristics between subtypes.

Conclusions: Our study is the largest cohort of PDAC gene expression profiles investigated so far, which greatly increased the statistical power and provided more robust results. We identified six molecular and clinical distinct subtypes to describe a more complete picture of the PDAC heterogeneity. The 160 subtype-specific markers and a deep learning based classification system may be used to better stratify PDAC patients for personalized treatments.

Keywords: Biclustering; Biomarkers; Deep learning; Heterogeneity; Pancreatic ductal adenocarcinoma; Subtype.

Publication types

  • Meta-Analysis

MeSH terms

  • Aged
  • Biomarkers, Tumor / genetics*
  • Carcinoma, Pancreatic Ductal / genetics*
  • Carcinoma, Pancreatic Ductal / mortality
  • Carcinoma, Pancreatic Ductal / pathology
  • Carcinoma, Pancreatic Ductal / therapy
  • Cluster Analysis
  • Data Analysis
  • Datasets as Topic
  • Deep Learning
  • Female
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Male
  • Microarray Analysis
  • Middle Aged
  • Pancreatic Neoplasms / genetics*
  • Pancreatic Neoplasms / mortality
  • Pancreatic Neoplasms / pathology
  • Pancreatic Neoplasms / therapy
  • Precision Medicine / methods
  • Prognosis
  • Retrospective Studies
  • Survival Analysis
  • Transcriptome / genetics

Substances

  • Biomarkers, Tumor