BiCaps-DBP: Predicting DNA-binding proteins from protein sequences using Bi-LSTM and a 1D-capsule network

Comput Biol Med. 2023 Sep:163:107241. doi: 10.1016/j.compbiomed.2023.107241. Epub 2023 Jul 8.

Abstract

Predicting DNA-binding proteins (DBPs) based solely on primary sequences is one of the most challenging problems in genome annotation. DBPs play a crucial role in various biological processes, including DNA replication, transcription, repair, and splicing. Some DBPs are essential in pharmaceutical research on various human cancers and autoimmune diseases. Existing experimental methods for identifying DBPs are time-consuming and costly. Therefore, developing a rapid and accurate computational technique is necessary to address the issue. This study introduces BiCaps-DBP, a deep learning-based method that improves DBP prediction performance by combining bidirectional long short-term memory with a 1D-capsule network. This study uses three training and independent datasets to evaluate the proposed model's generalizability and robustness. Based on three independent datasets, BiCaps-DBP achieved 1.05%, 5.79% and 0.40% higher accuracies than an existing predictor for PDB2272, PDB186 and PDB20000, respectively. These outcomes indicate that the proposed method is a promising DBP predictor.

Keywords: Bi-LSTM; Capsule network; DNA-Binding proteins; One-hot encoding.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • DNA-Binding Proteins* / genetics
  • DNA-Binding Proteins* / metabolism
  • Genome*
  • Humans

Substances

  • DNA-Binding Proteins