Systematic Modeling, Prediction, and Comparison of Domain-Peptide Affinities: Does it Work Effectively With the Peptide QSAR Methodology?

Qian Liu; Jing Lin; Li Wen; Shaozhou Wang; Peng Zhou; Li Mei; Shuyong Shang

doi:10.3389/fgene.2021.800857

Systematic Modeling, Prediction, and Comparison of Domain-Peptide Affinities: Does it Work Effectively With the Peptide QSAR Methodology?

Front Genet. 2022 Jan 14:12:800857. doi: 10.3389/fgene.2021.800857. eCollection 2021.

Authors

Qian Liu¹, Jing Lin¹, Li Wen¹, Shaozhou Wang¹, Peng Zhou¹, Li Mei², Shuyong Shang³

Affiliations

¹ Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China.
² Institute of Culinary, Sichuan Tourism University, Chengdu, China.
³ Institute of Ecological Environment Protection, Chengdu Normal University, Chengdu, China.

Abstract

The protein-protein association in cellular signaling networks (CSNs) often acts as weak, transient, and reversible domain-peptide interaction (DPI), in which a flexible peptide segment on the surface of one protein is recognized and bound by a rigid peptide-recognition domain from another. Reliable modeling and accurate prediction of DPI binding affinities would help to ascertain the diverse biological events involved in CSNs and benefit our understanding of various biological implications underlying DPIs. Traditionally, peptide quantitative structure-activity relationship (pQSAR) has been widely used to model and predict the biological activity of oligopeptides, which employs amino acid descriptors (AADs) to characterize peptide structures at sequence level and then statistically correlate the resulting descriptor vector with observed activity data via regression. However, the QSAR has not yet been widely applied to treat the direct binding behavior of large-scale peptide ligands to their protein receptors. In this work, we attempted to clarify whether the pQSAR methodology can work effectively for modeling and predicting DPI affinities in a high-throughput manner? Over twenty thousand short linear motif (SLiM)-containing peptide segments involved in SH3, PDZ and 14-3-3 domain-medicated CSNs were compiled to define a comprehensive sequence-based data set of DPI affinities, which were represented by the Boehringer light units (BLUs) derived from previous arbitrary light intensity assays following SPOT peptide synthesis. Four sophisticated MLMs (MLMs) were then utilized to perform pQSAR modeling on the set described with different AADs to systematically create a variety of linear and nonlinear predictors, and then verified by rigorous statistical test. It is revealed that the genome-wide DPI events can only be modeled qualitatively or semiquantitatively with traditional pQSAR strategy due to the intrinsic disorder of peptide conformation and the potential interplay between different peptide residues. In addition, the arbitrary BLUs used to characterize DPI affinity values were measured via an indirect approach, which may not very reliable and may involve strong noise, thus leading to a considerable bias in the modeling. The R _prd ² = 0.7 can be considered as the upper limit of external generalization ability of the pQSAR methodology working on large-scale DPI affinity data.

Keywords: amino acid descriptor; computational peptidology; domain-peptide interaction; machine learning; peptide quantitative structure-activity relationship; statistical modeling.