What are the minimal folding seeds in proteins? Experimental and theoretical assessment of secondary structure propensities of small peptide fragments

Chem Sci. 2023 Nov 23;15(2):594-608. doi: 10.1039/d3sc04960d. eCollection 2024 Jan 3.

Abstract

Certain peptide sequences, some of them as short as amino acid triplets, are significantly overpopulated in specific secondary structure motifs in folded protein structures. For example, 74% of the EAM triplet is found in α-helices, and only 3% occurs in the extended parts of proteins (typically β-sheets). In contrast, other triplets (such as VIV and IYI) appear almost exclusively in extended parts (79% and 69%, respectively). In order to determine whether such preferences are structurally encoded in a particular peptide fragment or appear only at the level of a complex protein structure, NMR, VCD, and ECD experiments were carried out on selected tripeptides: EAM (denoted as pro-'α-helical' in proteins), KAM(α), ALA(α), DIC(α), EKF(α), IYI(pro-β-sheet or more generally, pro-extended), and VIV(β), and the reference α-helical CATWEAMEKCK undecapeptide. The experimental data were in very good agreement with extensive quantum mechanical conformational sampling. Altogether, we clearly showed that the pro-helical vs. pro-extended propensities start to emerge already at the level of tripeptides and can be fully developed at longer sequences. We postulate that certain short peptide sequences can be considered minimal "folding seeds". Admittedly, the inherent secondary structure propensity can be overruled by the large intramolecular interaction energies within the folded and compact protein structures. Still, the correlation of experimental and computational data presented herein suggests that the secondary structure propensity should be considered as one of the key factors that may lead to understanding the underlying physico-chemical principles of protein structure and folding from the first principles.