Nucleosome positioning based on DNA sequence embedding and deep learning

BMC Genomics. 2022 Apr 13;23(Suppl 1):301. doi: 10.1186/s12864-022-08508-6.

Abstract

Background: Nucleosome positioning is the precise determination of the location of nucleosomes on DNA sequence. With the continuous advancement of biotechnology and computer technology, biological data is showing explosive growth. It is of practical significance to develop an efficient nucleosome positioning algorithm. Indeed, convolutional neural networks (CNN) can capture local features in DNA sequences, but ignore the order of bases. While the bidirectional recurrent neural network can make up for CNN's shortcomings in this regard and extract the long-term dependent features of DNA sequence.

Results: In this work, we use word vectors to represent DNA sequences and propose three new deep learning models for nucleosome positioning, and the integrative model NP_CBiR reaches a better prediction performance. The overall accuracies of NP_CBiR on H. sapiens, C. elegans, and D. melanogaster datasets are 86.18%, 89.39%, and 85.55% respectively.

Conclusions: Benefited by different network structures, NP_CBiR can effectively extract local features and bases order features of DNA sequences, thus can be considered as a complementary tool for nucleosome positioning.

Keywords: Bidirectional recurrent neural network; Convolutional neural network; Deep learning; Nucleosome positioning; Word vector.

MeSH terms

  • Animals
  • Base Sequence
  • Caenorhabditis elegans / genetics
  • Deep Learning*
  • Drosophila melanogaster / genetics
  • Nucleosomes* / genetics
  • Plant Extracts

Substances

  • Nucleosomes
  • Plant Extracts