StackCBPred: A stacking based prediction of protein-carbohydrate binding sites from sequence

Carbohydr Res. 2019 Dec 1:486:107857. doi: 10.1016/j.carres.2019.107857. Epub 2019 Oct 24.

Abstract

Carbohydrate-binding proteins play vital roles in many important biological processes. The study of these protein-carbohydrate interactions, at residue level, is useful in treating many critical diseases. Analyzing the local sequential environments of the binding and non-binding regions to predict the protein-carbohydrate binding sites is one of the challenging problems in molecular and computational biology. Existing experimental methods for identifying protein-carbohydrate binding sites are laborious and expensive. Thus, prediction of such binding sites, directly from sequences, using computational methods, can be useful to fast annotate the binding sites and guide the experimental process. Because the number of carbohydrate-binding residues is significantly lower than the number of non-carbohydrate-binding residues, most of the methods developed for the prediction of protein-carbohydrate binding sites are biased towards over predicting the negative class (or non-carbohydrate-binding). Here, we propose a balanced predictor, called StackCBPred, which utilizes features, extracted from evolution-driven sequence profile, called the position-specific scoring matrix (PSSM) and several predicted structural properties of amino acids to effectively train a Stacking-based machine learning method for the accurate prediction of protein-carbohydrate binding sites (https://bmll.cs.uno.edu/).

Keywords: Binding prediction; Machine learning; Protein-carbohydrate binding; Stacking.

MeSH terms

  • Binding Sites
  • Carbohydrate Metabolism*
  • Models, Molecular*
  • Protein Binding
  • Proteins / metabolism*

Substances

  • Proteins