Quantum walks (QWs) have a property that classical random walks (RWs) do not possess-the coexistence of linear spreading and localization-and this property is utilized to implement various kinds of applications. This paper proposes RW- and QW-based algorithms for multi-armed-bandit (MAB) problems. We show that, under some settings, the QW-based model realizes higher performance than the corresponding RW-based one by associating the two operations that make MAB problems difficult-exploration and exploitation-with these two behaviors of QWs.
Keywords: bandit algorithm; decision-making; exploration–exploitation trade-off; quantum walk; random walk.