Machine learning-based approaches for ubiquitination site prediction in human proteins

BMC Bioinformatics. 2023 Nov 28;24(1):449. doi: 10.1186/s12859-023-05581-w.

Abstract

Protein ubiquitination is a critical post-translational modification (PTMs) involved in numerous cellular processes. Identifying ubiquitination sites (Ubi-sites) on proteins offers valuable insights into their function and regulatory mechanisms. Due to the cost- and time-consuming nature of traditional approaches for Ubi-site detection, there has been a growing interest in leveraging artificial intelligence for computer-aided Ubi-site prediction. In this study, we collected experimentally verified Ubi-sites of human proteins from the dbPTM database, then conducted comprehensive state-of-the art computational methods along with standard evaluation metrics and a proper validation strategy for Ubi-site prediction. We presented the effectiveness of our framework by comparing ten machine learning (ML) based approaches in three different categories: feature-based conventional ML methods, end-to-end sequence-based deep learning (DL) techniques, and hybrid feature-based DL models. Our results revealed that DL approaches outperformed the classical ML methods, achieving a 0.902 F1-score, 0.8198 accuracy, 0.8786 precision, and 0.9147 recall as the best performance for a DL model using both raw amino acid sequences and hand-crafted features. Interestingly, our experimental results disclosed that the performance of DL methods had a positive correlation with the length of amino acid fragments, suggesting that utilizing the entire sequence can lead to more accurate predictions in future research endeavors. Additionally, we developed a meticulously curated benchmark for Ubi-site prediction in human proteins. This benchmark serves as a valuable resource for future studies, enabling fair and accurate comparisons between different methods. Overall, our work highlights the potential of ML, particularly DL techniques, in predicting Ubi-sites and furthering our knowledge of protein regulation through ubiquitination in cells.

Keywords: Deep learning; Machine learning; Post-translational modification; Ubiquitination.

MeSH terms

  • Artificial Intelligence*
  • Computational Biology / methods
  • Humans
  • Machine Learning
  • Protein Processing, Post-Translational
  • Proteins* / chemistry
  • Ubiquitination

Substances

  • Proteins