Prediction of methylation CpGs and their methylation degrees in human DNA sequences

Comput Biol Med. 2012 Apr;42(4):408-13. doi: 10.1016/j.compbiomed.2011.12.008. Epub 2011 Dec 29.

Abstract

DNA methylation plays a key role in the regulation of gene expression. The most common type of DNA modification consists of the methylation of cytosine in the CpG dinucleotide. The detections of DNA methylation have been determined mostly by experimental methods, which were time-consuming and expensive, difficult to meet the requirements of modern large-scale sequencing technology. Accordingly, it is necessary to develop automatic, reliable prediction methods for DNA methylation. In this study, the trinucleotide composition, a 64-dimensional feature vector of the occurrence frequency of 64 trinucleotides in the DNA sequence, was utilized to model SVM for the prediction of CpG methylation degrees in humans. The model was evaluated by jackknife validation and the correlation coefficient (R) and root-mean-square error (RMSE) were 0.8223 and 0.2042, respectively. The proposed method was also used to predict methylation sites, the model was evaluated by jackknife validation and the Matthews correlation coefficient (MCC) and accuracy (ACC) were 0.6263 and 0.8133, respectively. The good results indicated that the proposed method was a useful tool for the investigation of DNA methylation prediction research.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Composition
  • Computational Biology
  • CpG Islands*
  • DNA / chemistry*
  • DNA Methylation*
  • Humans
  • Models, Genetic*
  • Models, Statistical
  • Reproducibility of Results
  • Support Vector Machine

Substances

  • DNA