New Bounds on the Accuracy of Majority Voting for Multiclass Classification

IEEE Trans Neural Netw Learn Syst. 2024 Apr 23:PP. doi: 10.1109/TNNLS.2024.3387544. Online ahead of print.

Abstract

Majority voting is a simple mathematical function that returns the most frequently occurring value within a given set. As a popular decision fusion technique (DFT), the majority voting function (MVF) finds applications in resolving conflicts, where several independent voters report their opinions on a classification problem. Despite its importance and its various applications in ensemble learning, data crowdsourcing, remote sensing, and data oracles for blockchains, the accuracy of the MVF for the general multiclass classification problem has remained unknown. In this article, we derive a new upper bound on the accuracy of the MVF for the multiclass classification problem. More specifically, we show that under certain conditions, the error rate of the MVF exponentially decays toward 0 as the number of independent voters increases. Conversely, the error rate of the MVF exponentially grows with the number of voters if these conditions are not met. We first explore the problem for voters with independent and identically distributed (i.i.d.) outputs, where we assume that, given the true classification of the data point, every voter follows the same conditional probability distribution of voting for different classes. Next, we extend our results to encompass independent but nonidentically distributed votes. Using the derived results, we then provide a discussion on the accuracy of the truth discovery algorithms. We show that in the best-case scenarios, truth discovery algorithms operate as an amplified MVF and thereby achieve a small error rate only when the MVF achieves a small error rate too, and vice versa, achieve a large error rate when the MVF also achieves a large error rate. However, in the worst case scenario, the truth discovery algorithms may exhibit a significantly higher error rate than the MVF. Finally, we confirm our theoretical results using numerical simulations.