Identify and analysis crotonylation sites in histone by using support vector machines

Artif Intell Med. 2017 Nov:83:75-81. doi: 10.1016/j.artmed.2017.02.007. Epub 2017 Mar 7.

Abstract

Objective: Lysine crotonylation (Kcr) is a newly discovered histone posttranslational modification, which is specifically enriched at active gene promoters and potential enhancers in mammalian cell genomes. Although lysine crotonylation sites can be correctly identified with high-resolution mass spectrometry, the experimental methods are time-consuming and expensive. Therefore, it is necessary to develop computational methods to deal with this problem.

Methods: We proposed a new encoding scheme named position weight amino acid composition to extract sequence information of histone around crotonylation sites. We chose protein data from Uniprot database. A series of steps were used to construct a strict and objective benchmark dataset for training and testing the proposed method. All samples were characterized by a significant number of features derived from position weight amino acid composition. The support vector machine was used to perform classification.

Results: Based on a series of experiments, we found that the sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthew's correlation coefficient (MCC) were respectively 71.69%, 98.7%, 94.43%, and 0.778 in jackknife cross-validation. Comparison results demonstrated that our proposed model was better than random forest algorithm. We also performed the feature analysis on samples.

Conclusion: Identification of the Kcr sites in histone is an indispensable step for decoding protein function. Therefore, the method can promote the deep understanding of the physiological roles of crotonylation and provide useful information for developing drugs to treat various diseases associated with crotonylation.

Keywords: Crotonyllysine; PTMs; Sequence information; Support Vector machine.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Computational Biology / methods*
  • Databases, Protein
  • Histones / chemistry
  • Histones / metabolism*
  • Humans
  • Lysine
  • Mice
  • Protein Processing, Post-Translational*
  • Reproducibility of Results
  • Support Vector Machine*

Substances

  • Histones
  • Lysine