Machine-Learning-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency with Improved Accuracy

J Proteome Res. 2018 Aug 3;17(8):2715-2726. doi: 10.1021/acs.jproteome.8b00148. Epub 2018 Jul 2.

Abstract

Cell-penetrating peptides (CPPs) can enter cells as a variety of biologically active conjugates and have various biomedical applications. To offset the cost and effort of designing novel CPPs in laboratories, computational methods are necessitated to identify candidate CPPs before in vitro experimental studies. We developed a two-layer prediction framework called machine-learning-based prediction of cell-penetrating peptides (MLCPPs). The first-layer predicts whether a given peptide is a CPP or non-CPP, whereas the second-layer predicts the uptake efficiency of the predicted CPPs. To construct a two-layer prediction framework, we employed four different machine-learning methods and five different compositions including amino acid composition (AAC), dipeptide composition, amino acid index, composition-transition-distribution, and physicochemical properties (PCPs). In the first layer, hybrid features (combination of AAC and PCP) and extremely randomized tree outperformed state-of-the-art predictors in CPP prediction with an accuracy of 0.896 when tested on independent data sets, whereas in the second layer, hybrid features obtained through feature selection protocol and random forest produced an accuracy of 0.725 that is better than state-of-the-art predictors. We anticipate that our method MLCPP will become a valuable tool for predicting CPPs and their uptake efficiency and might facilitate hypothesis-driven experimental design. The MLCPP server interface along with the benchmarking and independent data sets are freely accessible at www.thegleelab.org/MLCPP .

Keywords: cell-penetrating peptides; extremely randomized tree; feature selection; machine learning; random forest; uptake efficiency.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / analysis
  • Animals
  • Cell-Penetrating Peptides / chemistry
  • Cell-Penetrating Peptides / pharmacokinetics*
  • Computational Biology*
  • Drug Design
  • Humans
  • Machine Learning
  • Models, Theoretical
  • Support Vector Machine*

Substances

  • Amino Acids
  • Cell-Penetrating Peptides