Secondary structure assignment of proteins in the absence of sequence information

Bioinform Adv. 2021 Nov 29;1(1):vbab038. doi: 10.1093/bioadv/vbab038. eCollection 2021.

Abstract

Motivation: The structure of proteins is organized in a hierarchy among which the secondary structure elements, α-helix, β-strand and loop, are the basic bricks. The determination of secondary structure elements usually requires the knowledge of the whole structure. Nevertheless, in numerous experimental circumstances, the protein structure is partially known. The detection of secondary structures from these partial structures is hampered by the lack of information about connecting residues along the primary sequence.

Results: We introduce a new methodology to estimate the secondary structure elements from the values of local distances and angles between the protein atoms. Our method uses a message passing neural network, named Sequoia, which allows the automatic prediction of secondary structure elements from the values of local distances and angles between the protein atoms. This neural network takes as input the topology of the given protein graph, where the vertices are protein residues, and the edges are weighted by values of distances and pseudo-dihedral angles generalizing the backbone angles ϕ and ψ. Any pair of residues, independently of its covalent bonds along the primary sequence of the protein, is tagged with this distance and angle information. Sequoia permits the automatic detection of the secondary structure elements, with an F1-score larger than 80% for most of the cases, when α helices and β strands are predicted. In contrast to the approaches classically used in structural biology, such as DSSP, Sequoia is able to capture the variations of geometry at the interface of adjacent secondary structure element. Due to its general modeling frame, Sequoia is able to handle graphs containing only C α atoms, which is particularly useful on low resolution structural input and in the frame of electron microscopy development.

Availability and implementation: Sequoia source code can be found at https://github.com/Khalife/Sequoia with additional documentation.

Supplementary information: Supplementary data are available at Bioinformatics Advances online.