TSNAPred: predicting type-specific nucleic acid binding residues via an ensemble approach

Brief Bioinform. 2022 Jul 18;23(4):bbac244. doi: 10.1093/bib/bbac244.

Abstract

The interplay between protein and nucleic acid participates in diverse biological activities. Accurately identifying the interaction between protein and nucleic acid can strengthen the understanding of protein function. However, conventional methods are too time-consuming, and computational methods are type-agnostic predictions. We proposed an ensemble predictor termed TSNAPred and first used it to identify residues that bind to A-DNA, B-DNA, ssDNA, mRNA, tRNA and rRNA. TSNAPred combines LightGBM and capsule network, both learned on the feature derived from protein sequence. TSNAPred utilizes the sliding window technique to extract long-distance dependencies between residues and a weighted ensemble strategy to enhance the prediction performance. The results show that TSNAPred can effectively identify type-specific nucleic acid binding residues in our test set. What is more, it also can discriminate DNA-binding and RNA-binding residues, which has improved 5% to 10% on the AUC value compared with other state-of-the-art methods. The dataset and code of TSNAPred are available at: https://github.com/niewenjuan-csu/TSNAPred.

Keywords: Capsule network; DNA-binding residues; LightGBM; RNA-binding residues.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Computational Biology* / methods
  • Nucleic Acids* / metabolism
  • Protein Binding
  • Proteins / metabolism

Substances

  • Nucleic Acids
  • Proteins