Improved inter-residue contact prediction via a hybrid generative model and dynamic loss function

Comput Struct Biotechnol J. 2022 Nov 12:20:6138-6148. doi: 10.1016/j.csbj.2022.11.020. eCollection 2022.

Abstract

Protein contact maps represent spatial pairwise inter-residue interactions, providing a protein's translationally and rotationally invariant topological representation. Accurate contact map prediction has been a critical driving force for improving protein structure determination. Contact maps can also be used as a stand-alone tool for varied applications such as prediction of protein-protein interactions, structure-aware thermal stability or physicochemical properties. We develop a novel hybrid contact map prediction model, CGAN-Cmap, that uses a generative adversarial neural network embedded with a series of modified squeeze and excitation residual networks. To exploit features of different dimensions, we introduce two parallel modules. This architecture improves the prediction by increasing receptive fields, surpassing redundant features and encouraging more meaningful ones from 1D and 2D inputs. We also introduce a new custom dynamic binary cross-entropy loss function to address the input imbalance problem for highly sparse long-range contacts in proteins with insufficient homologs. We evaluate the model's performance on CASP 11, 12, 13, 14, and CAMEO test sets. CGAN-Cmap outperforms state-of-the-art models, improving precision of medium and long-range contacts by at least 3.5%. As a direct assessment between our model and AlphaFold2, the leading available protein structure prediction model, we compare extracted contact maps from AlphaFold2 and predicted contact maps from CGAN-Cmap. The results show that CGAN-Cmap has a mean precision higher by 1% compared to AlphaFold2 for most ranges of contacts. These results demonstrate an efficient approach for highly accurate contact map prediction toward accurate characterization of protein structure, properties and functions from sequence.

Keywords: AF2, AlphaFold2; CASP; CGAN-Cmap; GAN; GANs, Generative adversarial neural networks; MSA; Protein contact map; ResNet, Residual neural network; SE-Concat.