Multi-Scale Capsule Network for Predicting DNA-Protein Binding Sites

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1793-1800. doi: 10.1109/TCBB.2020.3025579. Epub 2021 Oct 7.

Abstract

Discovering DNA-protein binding sites, also known as motif discovery, is the foundation for further analysis of transcription factors (TFs). Deep learning algorithms such as convolutional neural networks (CNN) have been introduced to motif discovery task and have achieved state-of-art performance. However, due to the limitations of CNN, motif discovery methods based on CNN do not take full advantage of large-scale sequencing data generated by high-throughput sequencing technology. Hence, in this paper we propose multi-scale capsule network architecture (MSC) integrating multi-scale CNN, a variant of CNN able to extract motif features of different lengths, and capsule network, a novel type of artificial neural network architecture aimed at improving CNN. The proposed method is tested on real ChIP-seq datasets and the experimental results show a considerable improvement compared with two well-tested deep learning-based sequence model, DeepBind and Deepsea.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites / genetics*
  • Chromatin Immunoprecipitation Sequencing
  • Computational Biology / methods*
  • DNA-Binding Proteins* / chemistry
  • DNA-Binding Proteins* / genetics
  • DNA-Binding Proteins* / metabolism
  • Deep Learning*
  • Protein Binding / genetics
  • Sequence Analysis, Protein
  • Transcription Factors* / chemistry
  • Transcription Factors* / genetics
  • Transcription Factors* / metabolism

Substances

  • DNA-Binding Proteins
  • Transcription Factors