Disordered-Ordered Protein Binary Classification by Circular Dichroism Spectroscopy

Front Mol Biosci. 2022 May 3:9:863141. doi: 10.3389/fmolb.2022.863141. eCollection 2022.

Abstract

Intrinsically disordered proteins lack a stable tertiary structure and form dynamic conformational ensembles due to their characteristic physicochemical properties and amino acid composition. They are abundant in nature and responsible for a large variety of cellular functions. While numerous bioinformatics tools have been developed for in silico disorder prediction in the last decades, there is a need for experimental methods to verify the disordered state. CD spectroscopy is widely used for protein secondary structure analysis. It is usable in a wide concentration range under various buffer conditions. Even without providing high-resolution information, it is especially useful when NMR, X-ray, or other techniques are problematic or one simply needs a fast technique to verify the structure of proteins. Here, we propose an automatized binary disorder-order classification method by analyzing far-UV CD spectroscopy data. The method needs CD data at only three wavelength points, making high-throughput data collection possible. The mathematical analysis applies the k-nearest neighbor algorithm with cosine distance function, which is independent of the spectral amplitude and thus free of concentration determination errors. Moreover, the method can be used even for strong absorbing samples, such as the case of crowded environmental conditions, if the spectrum can be recorded down to the wavelength of 212 nm. We believe the classification method will be useful in identifying disorder and will also facilitate the growth of experimental data in IDP databases. The method is implemented on a webserver and freely available for academic users.

Keywords: CD spectroscopy; disorder identifier; disorder–order classification; intrinsically disordered proteins; machine learning; protein secondary structure.