A network-based approach to identify disease-associated gene modules through integrating DNA methylation and gene expression

Biochem Biophys Res Commun. 2015 Sep 25;465(3):437-42. doi: 10.1016/j.bbrc.2015.08.033. Epub 2015 Aug 14.

Abstract

Formation and progression of complex diseases are generally the joint effect of genetic and epigenetic disorders, thus an integrative analysis of epigenetic and genetic data is essential for understanding mechanism of the diseases. In this study, we integrate Illuminate 450k DNA methylation and gene expression data to calculate the weights of gene network using Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA). The approach considers all methylation values of CpG sites in a gene, rather than averaging them which was used in other studies ignoring the variability of the methylation sites. Through comparing topological features of control network with those of case network, including global and local features, candidate disease-associated genes and gene modules are identified. We apply the approach to real data, breast invasive carcinoma (BRCA). It successfully identifies susceptibility breast cancer-related genes, such as TP53, BRCA1, EP300, CDK2, MCM7 and so forth, within which most are previously known to breast cancer. Also, GO and pathway enrichment analysis indicate that these genes enrich in cell apoptosis and regulation of cell death which are cancer-related biological processes. Importantly, through analyzing the functions and comparing expression and methylation values of these genes between cases and controls, we find some genes, such as VASN, SNRPD3, and gene modules, targeted by POLR2C, CHMP1B and TAF9, which might be novel breast cancer-related biomarkers.

Keywords: Canonical correlation analysis; DNA methylation; Gene expression; Gene network; Integrative analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Biomarkers, Tumor / genetics*
  • Breast Neoplasms / diagnosis
  • Breast Neoplasms / genetics*
  • Computer Simulation
  • DNA Methylation / genetics*
  • DNA, Neoplasm / genetics*
  • Female
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation / genetics
  • Genetic Association Studies
  • Genetic Predisposition to Disease / genetics
  • Humans
  • Models, Genetic
  • Molecular Sequence Data
  • Neoplasm Proteins / genetics*
  • Oligonucleotide Array Sequence Analysis / methods
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Systems Integration

Substances

  • Biomarkers, Tumor
  • DNA, Neoplasm
  • Neoplasm Proteins