cuRRBS: simple and robust evaluation of enzyme combinations for reduced representation approaches

Nucleic Acids Res. 2017 Nov 16;45(20):11559-11569. doi: 10.1093/nar/gkx814.

Abstract

DNA methylation is an important epigenetic modification in many species that is critical for development, and implicated in ageing and many complex diseases, such as cancer. Many cost-effective genome-wide analyses of DNA modifications rely on restriction enzymes capable of digesting genomic DNA at defined sequence motifs. There are hundreds of restriction enzyme families but few are used to date, because no tool is available for the systematic evaluation of restriction enzyme combinations that can enrich for certain sites of interest in a genome. Herein, we present customised Reduced Representation Bisulfite Sequencing (cuRRBS), a novel and easy-to-use computational method that solves this problem. By computing the optimal enzymatic digestions and size selection steps required, cuRRBS generalises the traditional MspI-based Reduced Representation Bisulfite Sequencing (RRBS) protocol to all restriction enzyme combinations. In addition, cuRRBS estimates the fold-reduction in sequencing costs and provides a robustness value for the personalised RRBS protocol, allowing users to tailor the protocol to their experimental needs. Moreover, we show in silico that cuRRBS-defined restriction enzymes consistently out-perform MspI digestion in many biological systems, considering both CpG and CHG contexts. Finally, we have validated the accuracy of cuRRBS predictions for single and double enzyme digestions using two independent experimental datasets.

MeSH terms

  • Animals
  • Arabidopsis / genetics
  • Binding Sites / genetics
  • CCCTC-Binding Factor / genetics
  • CCCTC-Binding Factor / metabolism
  • Computational Biology / methods*
  • CpG Islands / genetics
  • DNA Methylation / genetics*
  • DNA Restriction Enzymes / chemistry
  • Humans
  • Induced Pluripotent Stem Cells / cytology
  • Mice
  • Nuclear Respiratory Factor 1 / genetics
  • Nuclear Respiratory Factor 1 / metabolism
  • Sequence Analysis, DNA / economics*
  • Sequence Analysis, DNA / methods*
  • Whole Genome Sequencing / methods*

Substances

  • CCCTC-Binding Factor
  • CTCF protein, human
  • Nrf1 protein, mouse
  • Nuclear Respiratory Factor 1
  • DNA Restriction Enzymes