A survey on protein-DNA-binding sites in computational biology

Brief Funct Genomics. 2022 Sep 16;21(5):357-375. doi: 10.1093/bfgp/elac009.

Abstract

Transcription factors are important cellular components of the process of gene expression control. Transcription factor binding sites are locations where transcription factors specifically recognize DNA sequences, targeting gene-specific regions and recruiting transcription factors or chromatin regulators to fine-tune spatiotemporal gene regulation. As the common proteins, transcription factors play a meaningful role in life-related activities. In the face of the increase in the protein sequence, it is urgent how to predict the structure and function of the protein effectively. At present, protein-DNA-binding site prediction methods are based on traditional machine learning algorithms and deep learning algorithms. In the early stage, we usually used the development method based on traditional machine learning algorithm to predict protein-DNA-binding sites. In recent years, methods based on deep learning to predict protein-DNA-binding sites from sequence data have achieved remarkable success. Various statistical and machine learning methods used to predict the function of DNA-binding proteins have been proposed and continuously improved. Existing deep learning methods for predicting protein-DNA-binding sites can be roughly divided into three categories: convolutional neural network (CNN), recursive neural network (RNN) and hybrid neural network based on CNN-RNN. The purpose of this review is to provide an overview of the computational and experimental methods applied in the field of protein-DNA-binding site prediction today. This paper introduces the methods of traditional machine learning and deep learning in protein-DNA-binding site prediction from the aspects of data processing characteristics of existing learning frameworks and differences between basic learning model frameworks. Our existing methods are relatively simple compared with natural language processing, computational vision, computer graphics and other fields. Therefore, the summary of existing protein-DNA-binding site prediction methods will help researchers better understand this field.

Keywords: DNA–protein-binding sites; bioinformatics; convolutional neural network; deep learning; machine learning; recurrent neural networks; transcription factor binding site.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Binding Sites
  • Chromatin
  • Computational Biology* / methods
  • DNA
  • DNA-Binding Proteins
  • Transcription Factors

Substances

  • Chromatin
  • DNA-Binding Proteins
  • Transcription Factors
  • DNA