UMAP-DBP: An Improved DNA-Binding Proteins Prediction Method Based on Uniform Manifold Approximation and Projection

Protein J. 2021 Aug;40(4):562-575. doi: 10.1007/s10930-021-10011-y. Epub 2021 Jun 27.

Abstract

DNA-binding proteins play a vital role in cellular processes. It is an extremely urgent to develop a high-throughput method for efficiently identifying DNA-binding proteins. According to the current research situation, some methods in machine learning and deep learning show excellent computational speed and accuracy, which are worthy of application. In this work, a novel predictor was proposed to predict DNA binding proteins called UMAP-DBP. Firstly, the feature extraction of primary protein sequence was realized based on physicochemical distance transformation, Profile-based auto-cross covariance and General series correlation pseudo amino acid composition. Secondly, uniform manifold approximation and projection (UMAP) and feature importance score methods were used for feature selection; there is a progressive relationship between them. Finally, the Adaboost operation engine with jackknife test were adopted for predicting DNA-binding proteins. For the jackknife test on the BP1075 and BP594, we obtained an overall accuracy of 82.97% and 82.14%, Cohen's kappa (CK) of 0.66 and 0.64, respectively. The results illustrate that a feasible method has been developed for predicting DNA-binding proteins by UMAP and Adaboost. This is the first study in which UMAP has been successfully applied to identify DNA-binding proteins. All the datasets and codes are accessible at https://github.com/Wang-Jinyue/UMAP-DBP .

Keywords: Adaboost; DNA-binding proteins; Feature importance score; Uniform manifold approximation and projection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology*
  • DNA-Binding Proteins / chemistry*
  • Predictive Value of Tests
  • Protein Conformation
  • Software*

Substances

  • DNA-Binding Proteins