A Method for Identification of the Methylation Level of CpG Islands From NGS Data

Sci Rep. 2020 May 25;10(1):8635. doi: 10.1038/s41598-020-65406-1.

Abstract

In the course of sample preparation for Next Generation Sequencing (NGS), DNA is fragmented by various methods. Fragmentation shows a persistent bias with regard to the cleavage rates of various dinucleotides. With the exception of CpG dinucleotides the previously described biases were consistent with results of the DNA cleavage in solution. Here we computed cleavage rates of all dinucleotides including the methylated CpG and unmethylated CpG dinucleotides using data of the Whole Genome Sequencing datasets of the 1000 Genomes project. We found that the cleavage rate of CpG is significantly higher for the methylated CpG dinucleotides. Using this information, we developed a classifier for distinguishing cancer and healthy tissues based on their CpG islands statuses of the fragmentation. A simple Support Vector Machine classifier based on this algorithm shows an accuracy of 84%. The proposed method allows the detection of epigenetic markers purely based on mechanochemical DNA fragmentation, which can be detected by a simple analysis of the NGS sequencing data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cell Line, Tumor
  • CpG Islands
  • DNA Fragmentation
  • DNA Methylation*
  • Databases, Genetic
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Medulloblastoma / genetics
  • Medulloblastoma / pathology
  • Neoplasms / genetics
  • Neoplasms / pathology
  • Sequence Analysis, DNA
  • Support Vector Machine