Identification of ligand-binding residues using protein sequence profile alignment and query-specific support vector machine model

Anal Biochem. 2020 Sep 1:604:113799. doi: 10.1016/j.ab.2020.113799. Epub 2020 Jul 2.

Abstract

Information embedded in ligand-binding residues (LBRs) of proteins is important for understanding protein functions. How to accurately identify the potential ligand-binding residues is still a challenging problem, especially only protein sequence is given. In this paper, we establish a new query-specific computational method, named I-LBR, for the identification of LBRs without directly using the information of protein 3D structure. I-LBR includes two modes, named as I-LBRGP and I-LBRLS, for the general-purpose and ligand-specific LBR identification. For both modes, I-LBR first construct the specific training subset based on the query sequence information; then use support vector machine (SVM) algorithm to learn the LBR identification model; finally, predict the probability of each residue in query protein belongs to the class of LBR. Experimental results on four testing dataset demonstrate that I-LBRLS is the better choice against I-LBRGP, when the ligand type/types of the query protein binds is/are known. Comparing to other state-of-the-art LBR identification methods, I-LBR can achieve a better or comparable performance. The web-server of I-LBR and dataset used in this study are freely available for academic use at https://jun-csbio.github.io/I-LBR.

Keywords: Ligand-specific; Protein sequence profile alignment; Protein-ligand binding residue; Query-specific; SVM.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Binding Sites
  • Databases, Protein
  • Ligands
  • Protein Binding
  • Proteins* / chemistry
  • Proteins* / metabolism
  • Support Vector Machine*

Substances

  • Ligands
  • Proteins