BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning

Brief Bioinform. 2024 Mar 27;25(3):bbae195. doi: 10.1093/bib/bbae195.

Abstract

Transcription factors (TFs) are proteins essential for regulating genetic transcriptions by binding to transcription factor binding sites (TFBSs) in DNA sequences. Accurate predictions of TFBSs can contribute to the design and construction of metabolic regulatory systems based on TFs. Although various deep-learning algorithms have been developed for predicting TFBSs, the prediction performance needs to be improved. This paper proposes a bidirectional encoder representations from transformers (BERT)-based model, called BERT-TFBS, to predict TFBSs solely based on DNA sequences. The model consists of a pre-trained BERT module (DNABERT-2), a convolutional neural network (CNN) module, a convolutional block attention module (CBAM) and an output module. The BERT-TFBS model utilizes the pre-trained DNABERT-2 module to acquire the complex long-term dependencies in DNA sequences through a transfer learning approach, and applies the CNN module and the CBAM to extract high-order local features. The proposed model is trained and tested based on 165 ENCODE ChIP-seq datasets. We conducted experiments with model variants, cross-cell-line validations and comparisons with other models. The experimental results demonstrate the effectiveness and generalization capability of BERT-TFBS in predicting TFBSs, and they show that the proposed model outperforms other deep-learning models. The source code for BERT-TFBS is available at https://github.com/ZX1998-12/BERT-TFBS.

Keywords: BERT; attention mechanisms; bioinformatics; deep learning; gene regulation; transcription factor binding sites.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites
  • Computational Biology / methods
  • Deep Learning
  • Humans
  • Neural Networks, Computer*
  • Protein Binding
  • Transcription Factors* / genetics
  • Transcription Factors* / metabolism

Substances

  • Transcription Factors