The Information Content of Glutamine-Rich Sequences Define Protein Functional Characteristics

Proc IEEE Inst Electr Electron Eng. 2017 Feb;105(2):385-393. doi: 10.1109/JPROC.2016.2613076. Epub 2016 Dec 1.

Abstract

The presence of abnormally expanded glutamine (Q) repeats within specific proteins (e.g., huntingtin) are the well-established cause of several neurogenerative diseases, including Huntington disease and spinocerebellar ataxias. However, the impact of "expanded Q" stretches on the protein function is not well-understood, mostly due to lack of knowledge about the physiological role of Q repeats and the mechanism by which these repeats achieve functional-specificity. Indeed, is intriguing that regions with such low complexity (low information content) can display exquisite functional specificity. Prompting the question: where is this information stored? Applying biochemical/structural constraints and statistical analysis of protein composition we identified Q-rich (QR) regions present in coiled coils of yeast transcription factors and endocytic proteins. Our analysis indicated the existence of non-Q amino-acids differentially enriched or excluded from QR regions in one protein group versus the other. Importantly, when the non-Q amino-acids from an endocytic protein were exchanged by the ones enriched in QR from transcription factors, the resulting protein was unable to localize to the plasma membrane and was instead found in the nucleus. These results indicate that while QR repeats can efficiently engage in binding, the non-Q amino-acids provide essential specificity information. We speculate that coupling low complexity regions with information-intensive determinants might be a strategy used in many protein systems involved in different biological processes.

Keywords: amino acid preference; endocytic proteins; glutamine-rich regions; transcription factors.