Predicting Label Distribution From Tie-Allowed Multi-Label Ranking

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15364-15379. doi: 10.1109/TPAMI.2023.3300310. Epub 2023 Nov 3.

Abstract

Label distribution offers more information about label polysemy than logical label. There are presently two approaches to obtaining label distributions: LDL (label distribution learning) and LE (label enhancement). In LDL, experts must annotate training instances with label distributions, and a predictive function is trained on this training set to obtain label distributions. In LE, experts must annotate instances with logical labels, and label distributions are recovered from them. However, LDL is limited by expensive annotations, and LE has no performance guarantee. Therefore, we investigate how to predict label distribution from TMLR (tie-allowed multi-label ranking) which is a compromise on annotation cost but has good performance guarantees. On the one hand, we theoretically dissect the relationship between TMLR and label distribution. We define EAE (expected approximation error) to quantify the quality of an annotation, provide EAE bounds for TMLR, and derive the optimal range of label distributions corresponding to a given TMLR annotation. On the other hand, we propose a framework for predicting label distribution from TMLR via conditional Dirichlet mixtures. This framework blends the procedures of recovering and learning label distributions end-to-end and allows us to effortlessly encode our knowledge by a semi-adaptive scoring function. Extensive experiments validate our proposal.