CoeViz 2: Protein Graphs Derived from Amino Acid Covariance

Front Bioinform. 2021:1:653681. doi: 10.3389/fbinf.2021.653681. Epub 2021 Jun 24.

Abstract

Proteins by and large carry out their molecular functions in a folded state when residues, distant in sequence, assemble together in 3D space to bind a ligand, catalyze a reaction, form a channel, or exert another concerted macromolecular interaction. It has been long recognized that covariance of amino acids between distant positions within a protein sequence allows for the inference of long range contacts to facilitate 3D structure modeling. In this work, we investigated whether covariance analysis may reveal residues involved in the same molecular function. Building upon our previous work, CoeViz, we have conducted a large scale covariance analysis among 7595 non-redundant proteins with resolved 3D structures to assess (1) whether the residues with the same function coevolve, (2) which covariance metric captures such couplings better, and (3) how different molecular functions compare in this context. We found that the chi-squared metric is the most informative for the identification of coevolving functional sites, followed by the Pearson correlation-based, whereas mutual information is the least informative. Of the seven categories of the most common natural ligands, including coenzyme A, dinucleotide, DNA/RNA, heme, metal, nucleoside, and sugar, the trace metal binding residues display the most prominent coupling, followed by the sugar binding sites. We also developed a web-based tool, CoeViz 2, that enables the interactive visualization of covarying residues as cliques from a larger protein graph. CoeViz 2 is publicly available at https://research.cchmc.org/CoevLab/.

Keywords: CoeViz; amino acid covariance; coevolving functional sites; protein ligand binding sites; protein molecular graph.