Decision tree classifier based on topological characteristics of subgraph for the mining of protein complexes from large scale PPI networks

Comput Biol Chem. 2023 Oct:106:107935. doi: 10.1016/j.compbiolchem.2023.107935. Epub 2023 Jul 25.

Abstract

The growing accessibility of large-scale protein interaction data demands extensive research to understand cell organization and its functioning at the network level. Bioinformatics and data mining researchers have extensively studied network clustering to examine the structural and operational features of protein protein interaction (PPI) networks. Clustering PPI networks has proven useful in numerous research over the past two decades for identifying functional modules, understanding the roles of previously unknown proteins, and other purposes. Protein complexes represent one of the essential cellular components for creating biological activities. Inferring protein complexes has been made more accessible by experimental approaches. We offer a novel method that integrates the classification model with local topological data, making it more reliable and efficient. This article describes a decision tree classifier based on topological characteristics of the subgraph for mining protein complexes. The proposed graph-based algorithm is an effective and efficient way to identify protein complexes from large-scale PPI networks. The performance of the proposed algorithm is observed in protein-protein interaction networks of yeast and human in the Database of Interacting Proteins (DIP) and the Biological General Repository for Interaction Datasets (BioGRID) using widely accepted benchmark protein complexes from the comprehensive resource of mammalian protein complexes (CORUM) and the comprehensive catalogue of yeast protein complexes (CYC2008). The outcomes demonstrate that our method can outperform the best-performing supervised, semi-supervised, and unsupervised approaches to detecting protein complexes.

Keywords: Cluster density; Clustering; Decision tree classifier; Graph; PPI network; Protein complex.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Computational Biology / methods
  • Decision Trees
  • Fungal Proteins / metabolism
  • Humans
  • Protein Interaction Mapping* / methods
  • Protein Interaction Maps*
  • Saccharomyces cerevisiae / metabolism

Substances

  • Fungal Proteins