BGFE: A Deep Learning Model for ncRNA-Protein Interaction Predictions Based on Improved Sequence Information

Int J Mol Sci. 2019 Feb 23;20(4):978. doi: 10.3390/ijms20040978.

Abstract

The interactions between ncRNAs and proteins are critical for regulating various cellular processes in organisms, such as gene expression regulations. However, due to limitations, including financial and material consumptions in recent experimental methods for predicting ncRNA and protein interactions, it is essential to propose an innovative and practical approach with convincing performance of prediction accuracy. In this study, based on the protein sequences from a biological perspective, we put forward an effective deep learning method, named BGFE, to predict ncRNA and protein interactions. Protein sequences are represented by bi-gram probability feature extraction method from Position Specific Scoring Matrix (PSSM), and for ncRNA sequences, k-mers sparse matrices are employed to represent them. Furthermore, to extract hidden high-level feature information, a stacked auto-encoder network is employed with the stacked ensemble integration strategy. We evaluate the performance of the proposed method by using three datasets and a five-fold cross-validation after classifying the features through the random forest classifier. The experimental results clearly demonstrate the effectiveness and the prediction accuracy of our approach. In general, the proposed method is helpful for ncRNA and protein interacting predictions and it provides some serviceable guidance in future biological research.

Keywords: bi-gram; deep learning; k-mers; ncRNA-protein interaction; position specific scoring matrix.

MeSH terms

  • Amino Acid Sequence
  • Computational Biology / methods*
  • Databases, Protein
  • Deep Learning*
  • Position-Specific Scoring Matrices
  • Protein Binding
  • RNA, Untranslated / genetics*
  • ROC Curve
  • Software*

Substances

  • RNA, Untranslated