The Information Content of Glutamine-Rich Sequences Define Protein Functional Characteristics

Arpita Sen; Wen-Chieh Hsieh; R Claudio Aguilar

doi:10.1109/JPROC.2016.2613076

The Information Content of Glutamine-Rich Sequences Define Protein Functional Characteristics

Proc IEEE Inst Electr Electron Eng. 2017 Feb;105(2):385-393. doi: 10.1109/JPROC.2016.2613076. Epub 2016 Dec 1.

Authors

Arpita Sen^{1

2}, Wen-Chieh Hsieh¹, R Claudio Aguilar¹

Affiliations

¹ Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA.
² Current address, Dept. of Molecular & Cell Biology, University of California, Berkeley.

Abstract

The presence of abnormally expanded glutamine (Q) repeats within specific proteins (e.g., huntingtin) are the well-established cause of several neurogenerative diseases, including Huntington disease and spinocerebellar ataxias. However, the impact of "expanded Q" stretches on the protein function is not well-understood, mostly due to lack of knowledge about the physiological role of Q repeats and the mechanism by which these repeats achieve functional-specificity. Indeed, is intriguing that regions with such low complexity (low information content) can display exquisite functional specificity. Prompting the question: where is this information stored? Applying biochemical/structural constraints and statistical analysis of protein composition we identified Q-rich (Q_R) regions present in coiled coils of yeast transcription factors and endocytic proteins. Our analysis indicated the existence of non-Q amino-acids differentially enriched or excluded from Q_R regions in one protein group versus the other. Importantly, when the non-Q amino-acids from an endocytic protein were exchanged by the ones enriched in Q_R from transcription factors, the resulting protein was unable to localize to the plasma membrane and was instead found in the nucleus. These results indicate that while Q_R repeats can efficiently engage in binding, the non-Q amino-acids provide essential specificity information. We speculate that coupling low complexity regions with information-intensive determinants might be a strategy used in many protein systems involved in different biological processes.

Keywords: amino acid preference; endocytic proteins; glutamine-rich regions; transcription factors.

Grants and funding

R21 CA151961/CA/NCI NIH HHS/United States