FEPS: A Tool for Feature Extraction from Protein Sequence

Methods Mol Biol. 2022:2499:65-104. doi: 10.1007/978-1-0716-2317-6_3.

Abstract

Machine learning has become one of the most popular choices for developing computational approaches in protein structural bioinformatics. The ability to extract features from protein sequence/structure often becomes one of the crucial steps for the development of machine learning-based approaches. Over the years, various sequence, structural, and physicochemical descriptors have been developed for proteins and these descriptors have been used to predict/solve various bioinformatics problems. Hence, several feature extraction tools have been developed over the years to help researchers to generate numeric features from protein sequences. Most of these tools have some limitations regarding the number of sequences they can handle and the subsequent preprocessing that is required for the generated features before they can be fed to machine learning methods. Here, we present Feature Extraction from Protein Sequences (FEPS), a toolkit for feature extraction. FEPS is a versatile software package for generating various descriptors from protein sequences and can handle several sequences: the number of which is limited only by the computational resources. In addition, the features extracted from FEPS do not require subsequent processing and are ready to be fed to the machine learning techniques as it provides various output formats as well as the ability to concatenate these generated features. FEPS is made freely available via an online web server as well as a stand-alone toolkit. FEPS, a comprehensive toolkit for feature extraction, will help spur the development of machine learning-based models for various bioinformatics problems.

Keywords: Feature extraction; Machine learning; Posttranslational modifications; Protein descriptors; Sequence-based features.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Computational Biology* / methods
  • Machine Learning
  • Proteins / chemistry
  • Software*

Substances

  • Proteins