Sequence-Based Prediction with Feature Representation Learning and Biological Function Analysis of Channel Proteins

Front Biosci (Landmark Ed). 2022 Jun 2;27(6):177. doi: 10.31083/j.fbl2706177.

Abstract

Background: Channel proteins are proteins that can transport molecules past the plasma membrane through free diffusion movement. Due to the cost of labor and experimental methods, developing a tool to identify channel proteins is necessary for biological research on channel proteins.

Methods: 17 feature coding methods and four machine learning classifiers to generate 68-dimensional data probability features. Then, the two-step feature selection strategy was used to optimize the features, and the final prediction Model M16-LGBM (light gradient boosting machine) was obtained on the 16-dimensional optimal feature vector.

Results: A new predictor, CAPs-LGBM, was proposed to identify the channel proteins effectively.

Conclusions: CAPs-LGBM is the first channel protein machine learning predictor was used to construct the final prediction model based on protein primary sequences. The classifier performed well in the training and test sets.

Keywords: PPI network; channel protein; computational prediction; feature selection; light gradient boosting machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Computational Biology* / methods
  • Machine Learning
  • Proteins*
  • Support Vector Machine

Substances

  • Proteins