Majority Vote Cascading: A Semi-Supervised Framework for Improving Protein Function Prediction

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):1933-1945. doi: 10.1109/TCBB.2021.3059812. Epub 2022 Aug 8.

Abstract

A method to improve protein function prediction for sparsely annotated PPI networks is introduced. The method extends the DSD majority vote algorithm introduced by Cao et al. to give confidence scores on predicted labels and to use predictions of high confidence to predict the labels of other nodes in subsequent rounds. We call this a majority vote cascade. Several cascade variants are tested in a stringent cross-validation experiment on PPI networks from S. cerevisiae and D. melanogaster, and we show that for many different settings with several alternative confidence functions, cascading improves the accuracy of the predictions. A list of the most confident new label predictions in the two networks is also reported. Code and networks for the cross-validation experiments appear at http://bcb.cs.tufts.edu/cascade.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Drosophila melanogaster*
  • Proteins / metabolism
  • Saccharomyces cerevisiae* / genetics
  • Saccharomyces cerevisiae* / metabolism

Substances

  • Proteins