Hierarchical Multiagent Reinforcement Learning for Allocating Guaranteed Display Ads

IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5361-5373. doi: 10.1109/TNNLS.2021.3070484. Epub 2022 Oct 5.

Abstract

In this article, we study the problem of guaranteed display ads (GDAs) allocation, which requires proactively allocate display ads to different impressions to fulfill their impression demands indicated in the contracts. Existing methods for this problem either assume the impressions that are static or solely consider a specific ad's benefits. Thus, it is hard to generalize to the industrial production scenario where the impressions are dynamical and large-scale, and the overall allocation optimality of all the considered GDAs is required. To bridge this gap, we formulate this problem as a sequential decision-making problem in the scope of multiagent reinforcement learning (MARL), by assigning an allocation agent to each ad and coordinating all the agents for allocating GDAs. The inputs are the states (e.g., the demands of the ad and the remaining time steps for displaying the ads) of each ad and the impressions at different time steps, and the outputs are the display ratios of each ad for each impression. Specifically, we propose a novel hierarchical MARL (HMARL) method that creates hierarchies over the agent policies to handle a large number of ads and the dynamics of impressions. HMARL contains: 1) a manager policy to navigate the agent to choose an appropriate subpolicy and 2) a set of subpolicies that let the agents perform diverse conditioning on their states. Extensive experiments on three real-world data sets from the Tencent advertising platform with tens of millions of records demonstrate significant improvements of HMARL over state-of-the-art approaches.