ToxMVA: An end-to-end multi-view deep autoencoder method for protein toxicity prediction

Comput Biol Med. 2022 Dec;151(Pt B):106322. doi: 10.1016/j.compbiomed.2022.106322. Epub 2022 Nov 17.

Abstract

Effectively predicting protein toxicity plays an essential step in the early stage of protein-based drug discovery, which is of great help to speed up novel drug screening and reduce costs. Recently, several relevant datasets have been designed, and then machine learning-based methods have been proposed to predict the toxicity of the protein and have shown satisfactory performance. However, previous studies generally directly concatenate different protein features, which may introduce irrelevant information and decrease model performance. In this study, we present a novel end-to-end deep learning-based method called ToxMVA, to predict protein toxicity. To be specific, we first build comprehensive feature profiles of proteins based on primary sequences, including sequential, physicochemical, and contextual semantic information. Next, an autoencoder network is introduced to integrate the multi-view information for obtaining a more concise and accurate feature representation. Extensive experimental results on three datasets demonstrate that ToxMVA has superior performance for protein toxicity prediction and shows better robustness among three different datasets.

Keywords: Autoencoder network; Multi-view information; Protein toxicity prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Drug Discovery
  • Machine Learning*
  • Proteins*

Substances

  • Proteins