Cascaded Algorithm Selection With Extreme-Region UCB Bandit

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6782-6794. doi: 10.1109/TPAMI.2021.3094844. Epub 2022 Sep 14.

Abstract

AutoML aims at best configuring learning systems automatically. It contains core subtasks of algorithm selection and hyper-parameter tuning. Previous approaches considered searching in the joint hyper-parameter space of all algorithms, which forms a huge but redundant space and causes an inefficient search. We tackle this issue in a cascaded algorithm selection way, which contains an upper-level process of algorithm selection and a lower-level process of hyper-parameter tuning for algorithms. While the lower-level process employs an anytime tuning approach, the upper-level process is naturally formulated as a multi-armed bandit, deciding which algorithm should be allocated one more piece of time for the lower-level tuning. To achieve the goal of finding the best configuration, we propose the Extreme-Region Upper Confidence Bound (ER-UCB) strategy. Unlike UCB bandits that maximize the mean of feedback distribution, ER-UCB maximizes the extreme-region of feedback distribution. We first consider stationary distributions and propose the ER-UCB-S algorithm that has O(Klnn) regret upper bound with K arms and n trials. We then extend to non-stationary settings and propose the ER-UCB-N algorithm that has O(Knν) regret upper bound, where [Formula: see text]. Finally, empirical studies on synthetic and AutoML tasks verify the effectiveness of ER-UCB-S/N by their outperformance in corresponding settings.