In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins

Marco Anteghini; Vitor Martins Dos Santos; Edoardo Saccenti

doi:10.3390/ijms22126409

In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins

Int J Mol Sci. 2021 Jun 15;22(12):6409. doi: 10.3390/ijms22126409.

Authors

Marco Anteghini^{1

2}, Vitor Martins Dos Santos^{1

2}, Edoardo Saccenti¹

Affiliations

¹ Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands.
² LifeGlimmer GmbH, 12163 Berlin, Germany.

Abstract

Peroxisomes are ubiquitous membrane-bound organelles, and aberrant localisation of peroxisomal proteins contributes to the pathogenesis of several disorders. Many computational methods focus on assigning protein sequences to subcellular compartments, but there are no specific tools tailored for the sub-localisation (matrix vs. membrane) of peroxisome proteins. We present here In-Pero, a new method for predicting protein sub-peroxisomal cellular localisation. In-Pero combines standard machine learning approaches with recently proposed multi-dimensional deep-learning representations of the protein amino-acid sequence. It showed a classification accuracy above 0.9 in predicting peroxisomal matrix and membrane proteins. The method is trained and tested using a double cross-validation approach on a curated data set comprising 160 peroxisomal proteins with experimental evidence for sub-peroxisomal localisation. We further show that the proposed approach can be easily adapted (In-Mito) to the prediction of mitochondrial protein localisation obtaining performances for certain classes of proteins (matrix and inner-membrane) superior to existing tools.

Keywords: machine learning; neural networks; protein sequence encoding and embedding; sub-mitochondrial localisation; sub-peroxisomal localisation; subcellular localisation.

MeSH terms

Algorithms
Amino Acid Sequence
Deep Learning*
Membrane Proteins / chemistry*
Membrane Proteins / metabolism*
Mitochondrial Proteins / metabolism
Peroxisomes / metabolism*
Protein Transport
Reproducibility of Results
Software*

Substances

Membrane Proteins
Mitochondrial Proteins

Grants and funding

812968/H2020 Marie Skłodowska-Curie Actions