IIFS: An improved incremental feature selection method for protein sequence processing

Comput Biol Med. 2023 Dec:167:107654. doi: 10.1016/j.compbiomed.2023.107654. Epub 2023 Nov 3.

Abstract

Motivation: Discrete features can be obtained from protein sequences using a feature extraction method. These features are the basis of downstream processing of protein data, but it is necessary to screen and select some important features from them as they generally have data redundancy.

Result: Here, we report IIFS, an improved incremental feature selection method that exploits a new subset search strategy to find the optimal feature set. IIFS combines nonadjacent sorting features to prevent the drawbacks of data explosion and excessive reliance on feature sorting results. The comparative experimental results on 27 feature sorting data show that IIFS can find more accurate and important features compared to existing methods.The IIFS approach also handles data redundancy more efficiently and finds more representative and discriminatory features while ensuring minimal feature dimensionality and good evaluation metrics. Moreover, we wrap this method and deploy it on a web server for access at http://112.124.26.17:8005/.

Keywords: Data redundancy; Increment feature selection; Protein sequence; Sorting features.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Proteins*

Substances

  • Proteins