PiPred - a deep-learning method for prediction of π-helices in protein sequences

Jan Ludwiczak; Aleksander Winski; Antonio Marinho da Silva Neto; Krzysztof Szczepaniak; Vikram Alva; Stanislaw Dunin-Horkawicz

doi:10.1038/s41598-019-43189-4

PiPred - a deep-learning method for prediction of π-helices in protein sequences

Sci Rep. 2019 May 3;9(1):6888. doi: 10.1038/s41598-019-43189-4.

Authors

Jan Ludwiczak^{1

2}, Aleksander Winski¹, Antonio Marinho da Silva Neto¹, Krzysztof Szczepaniak¹, Vikram Alva³, Stanislaw Dunin-Horkawicz⁴

Affiliations

¹ Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland.
² Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Pasteura 3, 02-093, Warsaw, Poland.
³ Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
⁴ Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland. s.dunin-horkawicz@cent.uw.edu.pl.

Abstract

Canonical π-helices are short, relatively unstable secondary structure elements found in proteins. They comprise seven or more residues and are present in 15% of all known protein structures, often in functionally important regions such as ligand- and ion-binding sites. Given their similarity to α-helices, the prediction of π-helices is a challenging task and none of the currently available secondary structure prediction methods tackle it. Here, we present PiPred, a neural network-based tool for predicting π-helices in protein sequences. By performing a rigorous benchmark we show that PiPred can detect π-helices with a per-residue precision of 48% and sensitivity of 46%. Interestingly, some of the α-helices mispredicted by PiPred as π-helices exhibit a geometry characteristic of π-helices. Also, despite being trained only with canonical π-helices, PiPred can identify 6-residue-long α/π-bulges. These observations suggest an even higher effective precision of the method and demonstrate that π-helices, α/π-bulges, and other helical deformations may impose similar constraints on sequences. PiPred is freely accessible at: https://toolkit.tuebingen.mpg.de/#/tools/quick2d . A standalone version is available for download at: https://github.com/labstructbioinf/PiPred , where we also provide the CB6133, CB513, CASP10, and CASP11 datasets, commonly used for training and validation of secondary structure prediction methods, with correctly annotated π-helices.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Computational Biology / methods*
Deep Learning*
Models, Molecular
Protein Conformation, alpha-Helical
Proteins / chemistry*

Substances

Proteins