[Application of molecular descriptors for recognition of phosphorylation sites in amino acid sequences]

Biomed Khim. 2017 Oct;63(5):423-427. doi: 10.18097/PBMC20176305423.
[Article in Russian]

Abstract

Recognition of the phosphorylation sites in proteins is required for reconstruction of regulatory processes in living systems. This task is complicated because the phosphorylation motifs in amino acid sequences are considerably degenerated. To improve the prediction efficacy researchers often use additional descriptors, which should reflect physicochemical features of site-surrounding regions. We have evaluated the reasonability of this approach by applying molecular descriptors (MNA) for structural presentation of the peptide segments. Comparative testing was performed using the prognostic method PASS and two input data types: sets of the MNA descriptors represented peptides as chemical structures and amino acid sequences written using a one-letter code. Training sets were classified in accordance with the established types of the enzymes (protein kinases), modifying corresponding phosphorylation sites. The accuracy estimates obtained by prognosis validation for various classes of substrates were significantly different with both the letters and molecular descriptors. In case of the letter description, the prognosis accuracy demonstrated less dependence on the length of peptides in the training set, while in the case of structural descriptors the accuracy level was determined by the peptide size and descriptor characteristics (MNA levels). The maximal prognosis accuracy related to various kinase families was achieved at different sizes of molecular fragments covered by the MNA descriptors of corresponding levels. This obviously reflected structural differences in surroundings of phosphorylation sites modified by various protein kinases. The use of molecular descriptors provided the prognostic results comparable with the results obtained using traditional letter representation. The prognosis accuracy demonstrated less dependence on the method describing site-surrounding peptides at higher accuracy rates. Applying the MNA descriptors it is possible to achieve better accuracy in the cases when the letter description cannot provide acceptable accuracy.

Raspoznavanie saĭtov fosforilirovaniia v belkakh neobkhodimo dlia rekonstruktsii reguliatornykh protsessov v zhivykh sistemakh. Éta zadacha oslozhniaetsia tem, chto motivy fosforilirovaniia v aminokislotnykh posledovatel'nostiakh vyrozhdeny. Dlia povysheniia éffektivnosti predskazaniia chasto ispol'zuiut dopolnitel'nye deskriptory, kotorye dolzhny otrazhat' fiziko-khimicheskie svoĭstva saĭt-soderzhashchikh uchastkov. My otsenili tselesoobraznost' takogo podkhoda, primeniv strukturnoe opisanie peptidnykh segmentov s pomoshch'iu molekuliarnykh deskriptorov MNA. Bylo provedeno sravnitel'noe testirovanie s ispol'zovaniem prognosticheskogo metoda PASS i dvukh tipov vkhodnykh dannykh – naborov MNA-deskriptorov, opisyvaiushchikh peptidy kak khimicheskie struktury, i bukvennykh simvolov, kharakterizuiushchikh aminokislotnye posledovatel'nosti étikh zhe peptidov. Obuchaiushchie vyborki byli klassifitsirovany v sootvetstvii s ustanovlennym tipom modifitsiruiushchego fermenta (protekinkinazy). Poluchennye pri validatsii prognoza otsenki tochnosti dlia raznykh klassov substratov sushchestvenno razlichalis' pri ispol'zovanii kak bukvennykh, tak i molekuliarnykh deskriptorov. V sluchae bukvennogo opisaniia tochnost' prognoza v men'sheĭ stepeni zavisela ot dliny analiziruemykh peptidnykh segmentov v obuchaiushcheĭ vyborke, togda kak pri strukturnom opisanii tochnost' opredelialas' razmerami peptidov i kharakteristikoĭ (urovnem) MNA-deskriptorov. Naibol'shaia tochnost' prognoza spetsifichnosti k razlichnym semeĭstvam proteinkinaz dostigalas' pri raznykh razmerakh molekuliarnykh fragmentov, pokryvaemykh deskriptorami sootvetstvuiushchikh urovneĭ. Éto, po-vidimomu, otrazhalo strukturnye razlichiia v okruzhenii saĭtov, modifitsiruemykh proteinkinazami togo ili inogo tipa. Primenenie molekuliarnykh deskriptorov obespechilo rezul'taty prognoza, sopostavimye po tochnosti s rezul'tatami, poluchennymi pri traditsionnom bukvennom opisanii. Tochnost' prognoza pri vysokikh znacheniiakh men'she zavisela ot sposoba opisaniia saĭt-soderzhashchikh peptidov. V to zhe vremia, ispol'zovanie MNA-deskriptorov pozvolilo dostignut' bol'sheĭ tochnosti tam, gde bukvennoe opisanie ne obespechivalo priemlemoĭ tochnosti.

Keywords: amino acid sequences; molecular descriptors; phosphorylation motifs; protein phosphorylation; site prediction.

MeSH terms

  • Peptides / chemistry*
  • Phosphorylation*
  • Proteins / chemistry*
  • Sequence Analysis, Protein*

Substances

  • Peptides
  • Proteins