Centroid Estimation With Guaranteed Efficiency: A General Framework for Weakly Supervised Learning

Chen Gong; Jian Yang; Jane You; Masashi Sugiyama

doi:10.1109/TPAMI.2020.3044997

Centroid Estimation With Guaranteed Efficiency: A General Framework for Weakly Supervised Learning

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2841-2855. doi: 10.1109/TPAMI.2020.3044997. Epub 2022 May 5.

Authors

Chen Gong, Jian Yang, Jane You, Masashi Sugiyama

PMID: 33320809
DOI: 10.1109/TPAMI.2020.3044997

Abstract

In this paper, we propose a general framework termed centroid estimation with guaranteed efficiency (CEGE) for weakly supervised learning (WSL) with incomplete, inexact, and inaccurate supervision. The core of our framework is to devise an unbiased and statistically efficient risk estimator that is applicable to various weak supervision. Specifically, by decomposing the loss function (e.g., the squared loss and hinge loss) into a label-independent term and a label-dependent term, we discover that only the latter is influenced by the weak supervision and is related to the centroid of the entire dataset. Therefore, by constructing two auxiliary pseudo-labeled datasets with synthesized labels, we derive unbiased estimates of centroid based on the two auxiliary datasets, respectively. These two estimates are further linearly combined with a properly decided coefficient which makes the final combined estimate not only unbiased but also statistically efficient. This is better than some existing methods that only care about the unbiasedness of estimation but ignore the statistical efficiency. The good statistical efficiency of the derived estimator is guaranteed as we theoretically prove that it acquires the minimum variance when estimating the centroid. As a result, intensive experimental results on a large number of benchmark datasets demonstrate that our CEGE generally obtains better performance than the existing approaches related to typical WSL problems including semi-supervised learning, positive-unlabeled learning, multiple instance learning, and label noise learning.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Benchmarking
Supervised Machine Learning*