Prediction of aptamer-protein interacting pairs based on sparse autoencoder feature extraction and an ensemble classifier

Math Biosci. 2019 May:311:103-108. doi: 10.1016/j.mbs.2019.01.009. Epub 2019 Mar 15.

Abstract

Aptamer-protein interacting pairs play important roles in physiological functions and structural characterization. Identifying aptamer-protein interacting pairs is challenging and limited, despite of the tremendous applications of aptamers. Therefore, it is vital to construct a high prediction performance model for identifying aptamer-target interacting pairs. In this study, a novel ensemble method is presented to predict aptamer-protein interacting pairs by integrating sequence characteristics derived from aptamers and the target proteins. The features extracted for aptamers were the compositions of amino acids and pseudo K-tuple nucleotides. In addition, a sparse autoencoder was used to characterize features for the target protein sequences. To remove redundant features, gradient boosting decision tree (GBDT) and incremental feature selection (IFS) methods were used to obtain the optimum combination of sequence characters. Based on 616 selected features, an ensemble of three sub- support vector machine (SVM) classifiers was used to construct our prediction model. Evaluated on an independent dataset, our predictor obtained an accuracy of 75.7%, Matthew's Correlation Coefficient of 0.478, and Youden's Index of 0.538, which were superior to the values reached using other existing predictors. The results show that our model can be used to distinguishing novel aptamer-protein interacting pairs and revealing the interrelation between aptamers and proteins.

Keywords: Amino acid composition; Ensemble learning; Feature selection; Machine learning; Sparse autoencoder.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids*
  • Aptamers, Nucleotide*
  • Computational Biology
  • Humans
  • Models, Biological*
  • Proteins*
  • Support Vector Machine*

Substances

  • Amino Acids
  • Aptamers, Nucleotide
  • Proteins