PFmulDL: a novel strategy enabling multi-class and multi-label protein function annotation by integrating diverse deep learning methods

Comput Biol Med. 2022 Jun:145:105465. doi: 10.1016/j.compbiomed.2022.105465. Epub 2022 Mar 28.

Abstract

Bioinformatic annotation of protein function is essential but extremely sophisticated, which asks for extensive efforts to develop effective prediction method. However, the existing methods tend to amplify the representativeness of the families with large number of proteins by misclassifying the proteins in the families with small number of proteins. That is to say, the ability of the existing methods to annotate proteins in the 'rare classes' remains limited. Herein, a new protein function annotation strategy, PFmulDL, integrating multiple deep learning methods, was thus constructed. First, the recurrent neural network was integrated, for the first time, with the convolutional neural network to facilitate the function annotation. Second, a transfer learning method was introduced to the model construction for further improving the prediction performances. Third, based on the latest data of Gene Ontology, the newly constructed model could annotate the largest number of protein families comparing with the existing methods. Finally, this newly constructed model was found capable of significantly elevating the prediction performance for the 'rare classes' without sacrificing that for the 'major classes'. All in all, due to the emerging requirements on improving the prediction performance for the proteins in 'rare classes', this new strategy would become an essential complement to the existing methods for protein function prediction. All the models and source codes are freely available and open to all users at: https://github.com/idrblab/PFmulDL.

Keywords: Convolutional neural network; Deep learning; Gene ontology; Protein function prediction; Recurrent neural network.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Deep Learning*
  • Molecular Sequence Annotation
  • Neural Networks, Computer
  • Proteins / genetics
  • Proteins / metabolism

Substances

  • Proteins