Prediction of unconventional protein secretion by exosomes

BMC Bioinformatics. 2021 Jun 16;22(1):333. doi: 10.1186/s12859-021-04219-z.

Abstract

Motivation: In eukaryotes, proteins targeted for secretion contain a signal peptide, which allows them to proceed through the conventional ER/Golgi-dependent pathway. However, an important number of proteins lacking a signal peptide can be secreted through unconventional routes, including that mediated by exosomes. Currently, no method is available to predict protein secretion via exosomes.

Results: Here, we first assembled a dataset including the sequences of 2992 proteins secreted by exosomes and 2961 proteins that are not secreted by exosomes. Subsequently, we trained different random forests models on feature vectors derived from the sequences in this dataset. In tenfold cross-validation, the best model was trained on dipeptide composition, reaching an accuracy of 69.88% ± 2.08 and an area under the curve (AUC) of 0.76 ± 0.03. In an independent dataset, this model reached an accuracy of 75.73% and an AUC of 0.840. After these results, we developed ExoPred, a web-based tool that uses random forests to predict protein secretion by exosomes.

Conclusion: ExoPred is available for free public use at http://imath.med.ucm.es/exopred/ . Datasets are available at http://imath.med.ucm.es/exopred/datasets/ .

Keywords: Exosomes; Protein secretion; Random forests.

MeSH terms

  • Exosomes* / metabolism
  • Golgi Apparatus / metabolism
  • Protein Sorting Signals
  • Protein Transport
  • Proteins / metabolism

Substances

  • Protein Sorting Signals
  • Proteins