Gene expression and nucleotide composition are associated with genic methylation level in Oryza sativa

BMC Bioinformatics. 2014 Jan 21:15:23. doi: 10.1186/1471-2105-15-23.

Abstract

Background: The methylation of cytosines at CpG dinucleotides, which plays an important role in gene expression regulation, is one of the most studied epigenetic modifications. Thus far, the detection of DNA methylation has been determined mostly by experimental methods, which are not only prone to bench effects and artifacts but are also time-consuming, expensive, and cannot be easily scaled up to many samples. It is therefore useful to develop computational prediction methods for DNA methylation. Our previous studies highlighted the existence of correlations between the GC content of the third codon position (GC₃), methylation, and gene expression. We thus designed a model to predict methylation in Oryza sativa based on genomic sequence features and gene expression data.

Results: We first derive equations to describe the relationship between gene methylation levels, GC₃, expression, length, and other gene compositional features. We next assess gene compositional features involving sixmers and their association with methylation levels and other gene level properties. By applying our sixmer-based approach on rice gene expression data we show that it can accurately predict methylation (Pearson's correlation coefficient r = 0.79) for the majority (79%) of the genes. Matlab code with our model is included.

Conclusions: Gene expression variation can be used as predictors of gene methylation levels.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Base Composition / genetics*
  • Codon / genetics*
  • Computational Biology
  • Cytosine / metabolism
  • DNA Methylation*
  • Gene Expression Regulation, Plant*
  • Humans
  • Oryza / genetics*
  • Oryza / metabolism
  • Reproducibility of Results
  • Sequence Analysis, DNA

Substances

  • Codon
  • Cytosine