TOXIFY: a deep learning approach to classify animal venom proteins

PeerJ. 2019 Jun 28:7:e7200. doi: 10.7717/peerj.7200. eCollection 2019.

Abstract

In the era of Next-Generation Sequencing and shotgun proteomics, the sequences of animal toxigenic proteins are being generated at rates exceeding the pace of traditional means for empirical toxicity verification. To facilitate the automation of toxin identification from protein sequences, we trained Recurrent Neural Networks with Gated Recurrent Units on publicly available datasets. The resulting models are available via the novel software package TOXIFY, allowing users to infer the probability of a given protein sequence being a venom protein. TOXIFY is more than 20X faster and uses over an order of magnitude less memory than previously published methods. Additionally, TOXIFY is more accurate, precise, and sensitive at classifying venom proteins.

Keywords: Deep learning; Protein classification; Proteome; Transcriptome; Venom.

Grants and funding

This work was supported by a National Science Foundation Graduate Research Fellowship to T. Jeffrey Cole and the East Carolina University Department of Biology. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.