Structure-sequence features based prediction of phosphosites of serine/threonine protein kinases of Mycobacterium tuberculosis

Proteins. 2022 Jan;90(1):131-141. doi: 10.1002/prot.26195. Epub 2021 Aug 26.

Abstract

Elucidation of signaling events in a pathogen is potentially important to tackle the infection caused by it. Such events mediated by protein phosphorylation play important roles in infection, and therefore, to predict the phosphosites and substrates of the serine/threonine protein kinases, we have developed a Machine learning-based approach for Mycobacterium tuberculosis serine/threonine protein kinases using kinase-peptide structure-sequence data. This approach utilizes features derived from kinase three-dimensional-structure environment and known phosphosite sequences to generate support vector machine (SVM)-based kinase-specific predictions of phosphosites of serine/threonine protein kinases (STPKs) with no or scarce data of their substrates. SVM outperformed the four machine learning algorithms we tried (random forest, logistic regression, SVM, and k-nearest neighbors) with an area under the curve receiver-operating characteristic value of 0.88 on the independent testing dataset and a 10-fold cross-validation accuracy of ~81.6% for the final model. Our predicted phosphosites of M. tuberculosis STPKs form a useful resource for experimental biologists enabling elucidation of STPK mediated posttranslational regulation of important cellular processes.

Keywords: Mycobacterium tuberculosis; computational prediction; phosphorylation; serine/threonine protein kinases; support vector machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins* / chemistry
  • Bacterial Proteins* / genetics
  • Bacterial Proteins* / metabolism
  • Computational Biology
  • Mycobacterium tuberculosis / enzymology*
  • Phosphorylation
  • Protein Serine-Threonine Kinases* / chemistry
  • Protein Serine-Threonine Kinases* / genetics
  • Protein Serine-Threonine Kinases* / metabolism
  • Signal Transduction
  • Support Vector Machine

Substances

  • Bacterial Proteins
  • Protein Serine-Threonine Kinases