A Novel Multi-Ensemble Method for Identifying Essential Proteins

J Comput Biol. 2021 Jul;28(7):637-649. doi: 10.1089/cmb.2020.0527. Epub 2021 Jan 13.

Abstract

Essential proteins possess critical functions for cell survival. Identifying essential proteins improves our understanding of how a cell works and also plays a vital role in the research fields of disease treatment and drug development. Recently, some machine-learning methods and ensemble learning methods have been proposed to identify essential proteins by introducing effective protein features. However, the ensemble learning method only used to focus on the choice of base classifiers. In this article, we propose a novel ensemble learning framework called multi-ensemble to integrate different base classifiers. The multi-ensemble method adopts the idea of multi-view learning and selects multiple base classifiers and trains those classifiers by continually adding the samples that are predicted correctly by the other base classifiers. We applied multi-ensemble to Yeast data and Escherichia coli data. The results show that our approach achieved better performance than both individual classifiers and the other ensemble learning methods.

Keywords: ensemble learning; essential proteins; multi-ensemble; multi-view learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Escherichia coli / metabolism*
  • Escherichia coli Proteins / metabolism
  • Fungal Proteins / metabolism
  • Genes, Essential
  • Machine Learning
  • Proteins / analysis*
  • Yeasts / metabolism*

Substances

  • Escherichia coli Proteins
  • Fungal Proteins
  • Proteins