Effects analysis of reward functions on reinforcement learning for traffic signal control

Hyosun Lee; Yohee Han; Youngchan Kim; Yong Hoon Kim

doi:10.1371/journal.pone.0277813

Effects analysis of reward functions on reinforcement learning for traffic signal control

PLoS One. 2022 Nov 21;17(11):e0277813. doi: 10.1371/journal.pone.0277813. eCollection 2022.

Authors

Hyosun Lee¹, Yohee Han¹, Youngchan Kim¹, Yong Hoon Kim²

Affiliations

¹ Department of Transportation Engineering, University of Seoul, Seoul, Korea.
² Civil and Environmental Engineering, University of Windsor, Windsor, Canada.

Abstract

The increasing traffic demand in urban areas frequently causes traffic congestion, which can be managed only through intelligent traffic signal controls. Although many recent studies have focused on reinforcement learning for traffic signal control (RL-TSC), most have focused on improving performance from an intersection perspective, targeting virtual simulation. The performance indexes from intersection perspectives are averaged by the weighted traffic flow; therefore, if the balance of each movement is not considered, the green time may be overly concentrated on the movements of heavy flow rates. Furthermore, as the ultimate purpose of traffic signal control research is to apply these controls to the real-world intersections, it is necessary to consider the real-world constraints. Hence, this study aims to design RL-TSC considering real-world applicability and confirm the appropriate design of the reward function. The limitations of the detector in the real world and the dual-ring traffic signal system are taken into account in the model design to facilitate real-world application. To design the reward for balancing traffic movements, we define the average delay weighted by traffic volume per lane and entropy of delay in the reward function. Model training is performed at the prototype intersection for ensuring scalability to multiple intersections. The model after prototype pre-training is evaluated by applying it to a network with two intersections without additional training. As a result, the reward function considering the equality of traffic movements shows the best performance. The proposed model reduces the average delay by more than 7.4% and 15.0% compared to the existing real-time adaptive signal control at two intersections, respectively.

Copyright: © 2022 Lee et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computer Simulation
Reward*

Grants and funding

This work was supported by Korea Institute of Police Technology (KIPoT) grant funded by the Korea government(KNPA) (092021C29S01000, Development of Traffic Congestion Management System for Urban Network). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.