Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting

Entropy (Basel). 2020 Jan 23;21(11):1127. doi: 10.3390/e21111127. Epub 2019 Nov 16.

Abstract

Extracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data.

Keywords: coevolutionary analysis; direct-coupling analysis; maximum entropy models; protein contact predictions; sequence reweighting; specificity determining contacts.