CEGSO: Boosting Essential Proteins Prediction by Integrating Protein Complex, Gene Expression, Gene Ontology, Subcellular Localization and Orthology Information

Interdiscip Sci. 2021 Sep;13(3):349-361. doi: 10.1007/s12539-021-00426-7. Epub 2021 Mar 27.

Abstract

Essential proteins are assumed to be an indispensable element in sustaining normal physiological function and crucial to drug design and disease diagnosis. The discovery of essential proteins is of great importance in revealing the molecular mechanisms and biological processes. Owing to the tedious biological experiment, many numerical methods have been developed to discover key proteins by mining the features of the high throughput data. Appropriate integration of differential biological information based on protein-protein interaction (PPI) network has been proven useful in predicting essential proteins. The main intention of this research is to provide a comprehensive study and a review on identifying essential proteins by integrating multi-source data and provide guidance for researchers. Detailed analysis and comparison of current essential protein prediction algorithms have been carried out and tested on benchmark PPI networks. In addition, based on the previous method TEGS (short for the network Topology, gene Expression, Gene ontology, and Subcellular localization), we improve the performance of predicting essential proteins by incorporating known protein complex information, the gene expression profile, Gene Ontology (GO) terms information, subcellular localization information, and protein's orthology data into the PPI network, named CEGSO. The simulation results show that CEGSO achieves more accurate and robust results than other compared methods under different test datasets with various evaluation measurements.

Keywords: Computational method; Data integration; Essential proteins; High throughput data.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Biological Phenomena*
  • Computational Biology*
  • Gene Expression
  • Gene Ontology
  • Intracellular Space
  • Protein Binding
  • Protein Interaction Maps
  • Proteins / genetics
  • Proteins / metabolism
  • Transcriptome

Substances

  • Proteins