Hyb4mC: a hybrid DNA2vec-based model for DNA N4-methylcytosine sites prediction

BMC Bioinformatics. 2022 Jun 29;23(1):258. doi: 10.1186/s12859-022-04789-6.

Abstract

Background: DNA N4-methylcytosine is part of the restrictive modification system, which works by regulating some biological processes, for example, the initiation of DNA replication, mismatch repair and inactivation of transposon. However, using experimental methods to detect 4mC sites is time-consuming and expensive. Besides, considering the huge differences in the number of 4mC samples among different species, it is challenging to achieve a robust multi-species 4mC site prediction performance. Hence, it is of great significance to develop effective computational tools to identify 4mC sites.

Results: This work proposes a flexible deep learning-based framework to predict 4mC sites, called Hyb4mC. Hyb4mC adopts the DNA2vec method for sequence embedding, which captures more efficient and comprehensive information compared with the sequence-based feature method. Then, two different subnets are used for further analysis: Hyb_Caps and Hyb_Conv. Hyb_Caps is composed of a capsule neural network and can generalize from fewer samples. Hyb_Conv combines the attention mechanism with a text convolutional neural network for further feature learning.

Conclusions: Extensive benchmark tests have shown that Hyb4mC can significantly enhance the performance of predicting 4mC sites compared with the recently proposed methods.

Keywords: Capsule Neural Network; DNA N4-methylcytosine; DNA2vec; Site identification; Text Convolutional Neural Network.

MeSH terms

  • DNA* / genetics
  • Neural Networks, Computer*

Substances

  • DNA