Protein secondary structure assignment using pc-polyline and convolutional neural network

Proteins. 2021 Mar 29. doi: 10.1002/prot.26079. Online ahead of print.

Abstract

Motivation: The assignment of protein secondary structure elements (SSEs) underpins structural analysis and prediction. The backbone of a protein could be adequately represented using a pc-polyline that passes through the centers of its peptide planes. One salient feature of pc-polyline representation is that the secondary structure of a protein becomes recognizable in a matrix whose elements are the pairwise distances between two peptide plane centers. Thus, a pc-polyline could in turn be used to assign SSEs.

Results: Using convolutional neural network (CNN) here we confirm that a pc-polyline indeed contains enough information for it to be used for the accurate assignments of the six SSE types: α-helix, β-sheet, β-bulge, 310 -helix, turn and loop. The applications to three large data sets show that the assignments by our CNN-based p2psse program agree very well with those by dssp, stride and quite well with those by five other programs. The analyses of their SSE assignments raise some general questions about the characterizations of protein secondary structure. In particular the analyses illustrate the difficulty with giving a quantitative and consistent definition for each of the six SSE types especially for 310 -helix, β-bulge, turn or loop in terms of either backbone H-bond patterns, or backbone dihedral angles, or Cα -polyline or pc-polyline. The difficulty suggests that the SSE space though being dominated by the regions for the six SSE types is to a certain degree continuous.

Availability: The program is available at https://github.com/wlincong/p2pSSE.

Keywords: convolutional neural network; hydrogen bond; machine learning; peptide plane; protein secondary structure; secondary structure assignment.