PSSP: Protein splice site prediction algorithm using Bayesian approach

J Bioinform Comput Biol. 2019 Dec;17(6):1950034. doi: 10.1142/S0219720019500343.

Abstract

This study aimed to introduce an algorithm and identify intein motif and blocks involved in protein splicing, and explore the underlying methods in the development of detection of protein motifs. Inteins are mobile protein splicing elements capable of self-splicing post-translationally. They exist in viruses and bacteriophage, notwithstanding this broad phylogenetic distribution, all inteins apportion common structural features. A method was developed to predict intein in a raw sequence, using a ranking and scoring scheme based on amino acid θ value tables. This method aided in the identification and assessment of patterns characterizing the intein sequences. New intein conserved properties are revealed and the known ones are described and localized. We have computed the θ value of each amino acid at block A positions +1 to +13, block B positions l+13 to l+26 and block G positions -7 to +1 for the three categories. The consensus amino acids thus found are listed at the end of each row. We gave statistics for the distance between the blocks, block A to B, block B to F, and block F to G with the average being 66.1, 294, and 10.2 amino acids, respectively. The actual blocks A, B, and G of the one intein found in vacuolar membrane ATPase subunit, a precursor protein, are ranked 1. The results indicate all of the block sequences that are found in nine proteins are ranked at top of the list. The intein sequence is used to search the databases for intein-like proteins. Understanding the functional, structural, and dynamical aspects of inteins is important for intein engineering and the betterment of intein database.

Keywords: Protein splicing; algorithm; intein; intein engineering; motif; protein domain.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Bayes Theorem
  • Computational Biology / methods*
  • Conserved Sequence
  • Inteins*
  • Protein Isoforms
  • Protein Splicing*

Substances

  • Protein Isoforms