Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition

J Theor Biol. 2017 May 21:421:1-15. doi: 10.1016/j.jtbi.2017.03.023. Epub 2017 Mar 27.

Abstract

Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage.

Keywords: Extreme learning machine; Feature extraction; Protein descriptor; Protein fold recognition.

MeSH terms

  • Amino Acid Sequence
  • Computational Biology
  • Datasets as Topic
  • Machine Learning*
  • Principal Component Analysis
  • Protein Conformation
  • Protein Folding*
  • Sequence Analysis, Protein*
  • Sequence Homology, Amino Acid