4mC-CGRU: Identification of N4-Methylcytosine (4mC) sites using convolution gated recurrent unit in Rosaceae genome

Comput Biol Chem. 2023 Dec:107:107974. doi: 10.1016/j.compbiolchem.2023.107974. Epub 2023 Oct 30.

Abstract

An epigenetic modification is DNA N4-methylcytosine (4mC) that affects several biological functions without altering the DNA nucleotides, including DNA conformation, cell development, replication, stability, and DNA structural changes. To prevent restriction enzyme from damaging self-DNA, 4mC performs a critical role in restriction-modification functions. Existing studies mainly focused on finding hand-crafted features to identify 4mC locations, but these methods are inefficient due to high time consuming and high costs. In our research work, we propose a 4mC-CGRU which is a deep learning-based computational model with a standard encoding method to identify the 4mC sites from DNA sequences that learned autonomous feature selection in the Rosaceae genome, particularly in Rosa chinensis (R. chinensis) and Fragaria vesca (F. vesca). The proposed model consists of a convolutional neural network (CNN) and a gated recurrent unit network (GRU)-based model for identifying 4mC sites from Fragaria vesca and Rosa chinensis in the genomes. The CNN model extracts useful features from the datasets and the GRU classifies the DNA sequences. Thus, our approach can automatically extract important features to detect relative sites from DNA sequence. The performance analysis shows that the proposed model consistently outperforms over the state-of-the-art works in detecting 4mC sites.

Keywords: Convolutional Neural Network (CNN); DNA N4–methylcytosine (4mC); Deep Learning(DL); Gated Recurrent Unit (GRU); Statistical metrics.

MeSH terms

  • DNA / chemistry
  • Epigenesis, Genetic
  • Fragaria* / genetics
  • Genome
  • Neural Networks, Computer
  • Rosaceae* / genetics

Substances

  • DNA