VacPred: Sequence-based prediction of plant vacuole proteins using machine-learning techniques

J Biosci. 2020:45:106.

Abstract

Subcellular localization prediction of the proteome is one of major goals of large-scale genome or proteome sequencing projects to define the gene functions that could be possible with the help of computational modeling techniques. Previously, different methods have been developed for this purpose using multi-label classification system and achieved a high level of accuracy. However, during the validation of our blind dataset of plant vacuole proteins, we observed that they have poor performance with accuracy value range from ~1.3% to 48.5%. The results showed that the previously developed methods are not very accurate for the plant vacuole protein prediction and thus emphasize the need to develop a more accurate and reliable algorithm. In this study, we have developed various compositions as well as PSSM-based models and achieved a high accuracy than previously developed methods. We have shown that our best model achieved ~63% accuracy on blind dataset, which is far better than currently available tools. Furthermore, we have implemented our best models in the form of GUI-based free software called 'VacPred' which is compatible with both Linux and Window platform. This software is freely available for download at www.deepaklab.com/vacpred.

MeSH terms

  • Benchmarking
  • Computational Biology / methods
  • Databases, Protein
  • Datasets as Topic
  • Plant Cells / metabolism
  • Plant Proteins / classification
  • Plant Proteins / genetics*
  • Plant Proteins / metabolism
  • Proteome / classification
  • Proteome / genetics*
  • Proteome / metabolism
  • ROC Curve
  • Software*
  • Support Vector Machine*
  • Vacuoles / genetics*
  • Vacuoles / metabolism
  • Viridiplantae / genetics*
  • Viridiplantae / metabolism

Substances

  • Plant Proteins
  • Proteome