MGF6mARice: prediction of DNA N6-methyladenine sites in rice by exploiting molecular graph feature and residual block

Brief Bioinform. 2022 May 13;23(3):bbac082. doi: 10.1093/bib/bbac082.

Abstract

DNA N6-methyladenine (6mA) is produced by the N6 position of the adenine being methylated, which occurs at the molecular level, and is involved in numerous vital biological processes in the rice genome. Given the shortcomings of biological experiments, researchers have developed many computational methods to predict 6mA sites and achieved good performance. However, the existing methods do not consider the occurrence mechanism of 6mA to extract features from the molecular structure. In this paper, a novel deep learning method is proposed by devising DNA molecular graph feature and residual block structure for 6mA sites prediction in rice, named MGF6mARice. Firstly, the DNA sequence is changed into a simplified molecular input line entry system (SMILES) format, which reflects chemical molecular structure. Secondly, for the molecular structure data, we construct the DNA molecular graph feature based on the principle of graph convolutional network. Then, the residual block is designed to extract higher level, distinguishable features from molecular graph features. Finally, the prediction module is used to obtain the result of whether it is a 6mA site. By means of 10-fold cross-validation, MGF6mARice outperforms the state-of-the-art approaches. Multiple experiments have shown that the molecular graph feature and residual block can promote the performance of MGF6mARice in 6mA prediction. To the best of our knowledge, it is the first time to derive a feature of DNA sequence by considering the chemical molecular structure. We hope that MGF6mARice will be helpful for researchers to analyze 6mA sites in rice.

Keywords: DNA N6-methyladenine; DNA molecular graph feature; SMILES; residual block; rice genome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenine
  • DNA / genetics
  • DNA Methylation
  • Delayed Emergence from Anesthesia* / genetics
  • Oryza* / genetics

Substances

  • DNA
  • Adenine