Network-Scale Traffic Signal Control via Multiagent Reinforcement Learning With Deep Spatiotemporal Attentive Network

Hao Huang; Zhiqun Hu; Zhaoming Lu; Xiangming Wen

doi:10.1109/TCYB.2021.3087228

Network-Scale Traffic Signal Control via Multiagent Reinforcement Learning With Deep Spatiotemporal Attentive Network

IEEE Trans Cybern. 2023 Jan;53(1):262-274. doi: 10.1109/TCYB.2021.3087228. Epub 2022 Dec 23.

Authors

Hao Huang, Zhiqun Hu, Zhaoming Lu, Xiangming Wen

PMID: 34343099
DOI: 10.1109/TCYB.2021.3087228

Abstract

The continuous development of intelligent traffic control systems has a profound influence on urban traffic planning and traffic management. Indeed, as big data and artificial intelligence continue to evolve, the traffic control strategy based on deep reinforcement learning (RL) has been proven to be a promising method to improve the efficiency of intersections and save people's travel time. However, the existing algorithms ignore the temporal and spatial characteristics of intersections. In this article, we propose a multiagent RL based on the deep spatiotemporal attentive neural network (MARL-DSTAN) to determine the traffic signal timing in a large-scale road network. In this model, the state information captures the spatial dependency of the entire road network by leveraging the graph convolutional network (GCN) and integrates the information based on the importance of intersections via the attention mechanism. Meanwhile, to accumulate more valuable samples and enhance the learning efficiency, the recurrent neural network (RNN) is introduced in the exploration stage to constrain the action search space instead of fully random exploration. MARL-DSTAN decomposes the large-scale area into multiple base environments, and the agents in each base environment use the idea of "centralized training and decentralized execution" to learn to accelerate the algorithm convergence. The simulation results show that our algorithm significantly outperforms the fixed timing scheme and several other state-of-the-art baseline RL algorithms.