AAindex-PPII: Predicting polyproline type II helix structure based on amino acid indexes with an improved BiGRU-TextCNN model

J Bioinform Comput Biol. 2023 Oct;21(5):2350022. doi: 10.1142/S0219720023500221. Epub 2023 Oct 28.

Abstract

The polyproline-II (PPII) structure domain is crucial in organisms' signal transduction, transcription, cell metabolism, and immune response. It is also a critical structural domain for specific vital disease-associated proteins. Recognizing PPII is essential for understanding protein structure and function. To accurately predict PPII in proteins, we propose a novel method, AAindex-PPII, which only adopts amino acid index to characterize protein sequences and uses a Bidirectional Gated Recurrent Unit (BiGRU)-Improved TextCNN composite deep learning model to predict PPII in proteins. Experimental results show that, when tested on the same datasets, our method outperforms the state-of-the-art BERT-PPII method, achieving an AUC value of 0.845 on the strict data and an AUC value of 0.813 on the non-strict data, which is 0.024 and 0.03 higher than that of the BERT-PPII method. This study demonstrates that our proposed method is simple and efficient for PPII prediction without using pre-trained large models or complex features such as position-specific scoring matrices.

Keywords: AAindex; BiGRU; Polyproline-II helix; TextCNN; structure.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Amino Acids*
  • Peptides* / chemistry
  • Proteins

Substances

  • polyproline
  • Amino Acids
  • Peptides
  • Proteins