Predicting In-Vitro DNA-Protein Binding With a Spatially Aligned Fusion of Sequence and Shape

IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3144-3153. doi: 10.1109/TCBB.2021.3133869. Epub 2022 Dec 8.

Abstract

Discovery of transcription factor binding sites (TFBSs) is of primary importance for understanding the underlying binding mechanic and gene regulation process. Growing evidence indicates that apart from the primary DNA sequences, DNA shape landscape has a significant influence on transcription factor binding preference. To effectively model the co-influence of sequence and shape features, we emphasize the importance of position information of sequence motif and shape pattern. In this paper, we propose a novel deep learning-based architecture, named hybridShape eDeepCNN, for TFBS prediction which integrates DNA sequence and shape information in a spatially aligned manner. Our model utilizes the power of the multi-layer convolutional neural network and constructs an independent subnetwork to adapt for the distinct data distribution of heterogeneous features. Besides, we explore the usage of continuous embedding vectors as the representation of DNA sequences. Based on the experiments on 20 in-vitro datasets derived from universal protein binding microarrays (uPBMs), we demonstrate the superiority of our proposed method and validate the underlying design logic.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites / genetics
  • DNA / chemistry
  • DNA-Binding Proteins* / metabolism
  • Protein Binding
  • Transcription Factors* / metabolism

Substances

  • Transcription Factors
  • DNA-Binding Proteins
  • DNA