Identification of coenzyme-binding proteins with machine learning algorithms

Comput Biol Chem. 2019 Apr:79:185-192. doi: 10.1016/j.compbiolchem.2019.01.014. Epub 2019 Jan 28.

Abstract

The coenzyme-binding proteins play a vital role in the cellular metabolism processes, such as fatty acid biosynthesis, enzyme and gene regulation, lipid synthesis, particular vesicular traffic, and β-oxidation donation of acyl-CoA esters. Based on the theory of Star Graph Topological Indices (SGTIs) of protein primary sequences, we proposed a method to develop a first classification model for predicting protein with coenzyme-binding properties. To simulate the properties of coenzyme-binding proteins, we created a dataset containing 2897 proteins, among 456 proteins functioned as coenzyme-binding activity. The SGTIs of peptide sequence were calculated with Sequence to Star Network (S2SNet) application. We used the SGTIs as inputs to several classification techniques with a machine learning software - Weka. A Random Forest classifier based on 3 features of the embedded and non-embedded graphs was identified as the best predictive model for coenzyme-binding proteins. This model developed was with the true positive (TP) rate of 91.7%, false positive (FP) rate of 7.6%, and Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.971. The prediction of new coenzyme-binding activity proteins using this model could be useful for further drug development or enzyme metabolism researches.

Keywords: Classification model; Coenzyme-binding; Protein sequence; Random Forest; Topological indices.

MeSH terms

  • Acyl Coenzyme A / chemistry*
  • Acyl Coenzyme A / metabolism
  • Esters / chemistry
  • Esters / metabolism
  • Humans
  • Machine Learning*
  • Models, Molecular
  • Molecular Structure
  • Proteins / chemistry*
  • Proteins / metabolism
  • Software

Substances

  • Acyl Coenzyme A
  • Esters
  • Proteins