autoBioSeqpy: A Deep Learning Tool for the Classification of Biological Sequences

J Chem Inf Model. 2020 Aug 24;60(8):3755-3764. doi: 10.1021/acs.jcim.0c00409. Epub 2020 Aug 5.

Abstract

Deep learning has proven to be a powerful method with applications in various fields including image, language, and biomedical data. Thanks to the libraries and toolkits such as TensorFlow, PyTorch, and Keras, researchers can use different deep learning architectures and data sets for rapid modeling. However, the available implementations of neural networks using these toolkits are usually designed for a specific research and are difficult to transfer to other work. Here, we present autoBioSeqpy, a tool that uses deep learning for biological sequence classification. The advantage of this tool is its simplicity. Users only need to prepare the input data set and then use a command line interface. Then, autoBioSeqpy automatically executes a series of customizable steps including text reading, parameter initialization, sequence encoding, model loading, training, and evaluation. In addition, the tool provides various ready-to-apply and adapt model templates to improve the usability of these networks. We introduce the application of autoBioSeqpy on three biological sequence problems: the prediction of type III secreted proteins, protein subcellular localization, and CRISPR/Cas9 sgRNA activity. autoBioSeqpy is freely available with examples at https://github.com/jingry/autoBioSeqpy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Deep Learning*
  • Neural Networks, Computer
  • Protein Transport