Random Fourier features-based sparse representation classifier for identifying DNA-binding proteins

Comput Biol Med. 2022 Dec;151(Pt A):106268. doi: 10.1016/j.compbiomed.2022.106268. Epub 2022 Nov 9.

Abstract

DNA-binding proteins (DBPs) protect DNA from nuclease hydrolysis, inhibit the action of RNA polymerase, prevents replication and transcription from occurring simultaneously on a piece of DNA. Most of the conventional methods for detecting DBPs are biochemical methods, but the time cost is high. In recent years, a variety of machine learning-based methods that have been used on a large scale for large-scale screening of DBPs. To improve the prediction performance of DBPs, we propose a random Fourier features-based sparse representation classifier (RFF-SRC), which randomly map the features into a high-dimensional space to solve nonlinear classification problems. And L2,1-matrix norm is introduced to get sparse solution of model. To evaluate performance, our model is tested on several benchmark data sets of DBPs and 8 UCI data sets. RFF-SRC achieves better performance in experimental results.

Keywords: Biological sequence features; Random features; Sequence classification; Sparse representation-based classifier.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • DNA
  • DNA-Binding Proteins*
  • Machine Learning

Substances

  • DNA-Binding Proteins
  • DNA