On Gap-Based Lower Bounding Techniques for Best-Arm Identification

Lan V Truong; Jonathan Scarlett

doi:10.3390/e22070788

On Gap-Based Lower Bounding Techniques for Best-Arm Identification

Entropy (Basel). 2020 Jul 20;22(7):788. doi: 10.3390/e22070788.

Authors

Lan V Truong¹, Jonathan Scarlett²

Affiliations

¹ Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK.
² Department of Computer Science & Department of Mathematics, National University of Singapore, Singapore 117418, Singapore.

Abstract

In this paper, we consider techniques for establishing lower bounds on the number of arm pulls for best-arm identification in the multi-armed bandit problem. While a recent divergence-based approach was shown to provide improvements over an older gap-based approach, we show that the latter can be refined to match the former (up to constant factors) in many cases of interest under Bernoulli rewards, including the case that the rewards are bounded away from zero and one. Together with existing upper bounds, this indicates that the divergence-based and gap-based approaches are both effective for establishing sample complexity lower bounds for best-arm identification.

Keywords: PAC learning; best-arm identification; information-theoretic lower bounds; multi-armed bandits.

Grants and funding

R-252-000-A74-281/National Research Foundation Singapore