TY - GEN
T1 - Learning traffic signal control from demonstrations
AU - Xiong, Yuanhao
AU - Xu, Kai
AU - Zheng, Guanjie
AU - Li, Zhenhui
N1 - Funding Information:
The work was supported in part by NSF awards #1652525 and #1618448. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agencies.
Funding Information:
and they can be considered as an ignorable constant cost. It guar-ACKNOWLEDGMENTS antees scalablity of our method to process scenarios with large The work was supported in part by NSF awards #1652525 and traffic. Hence, we only focus on time cost of training. Despite the #1618448. The views and conclusions contained in this paper are samecomputationalcomplexityofDemoLightasA2C,ourmethod those of the authors and should not be interpreted as representing greatly reduces exploration space, thus accelerating training. We any funding agencies.
Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/11/3
Y1 - 2019/11/3
N2 - Reinforcement learning (RL) has recently become a promising approach in various decision-making tasks. Among them, traffic signal control is the one where RL makes a great breakthrough. However, these methods always suffer from the prominent exploration problem and even fail to converge. To resolve this issue, we make an analogy between agents and humans. Agents can learn from demonstrations generated by traditional traffic signal control methods, in the similar way as people master a skill from expert knowledge. Therefore, we propose DemoLight, for the first time, to leverage demonstrations collected from classic methods to accelerate learning. Based on the state-of-the-art deep RL method Advantage Actor-Critic (A2C), training with demos are carried out for both the actor and the critic and reinforcement learning is followed for further improvement. Results under real-world datasets show that DemoLight enables a more efficient exploration and outperforms existing baselines with faster convergence and better performance.
AB - Reinforcement learning (RL) has recently become a promising approach in various decision-making tasks. Among them, traffic signal control is the one where RL makes a great breakthrough. However, these methods always suffer from the prominent exploration problem and even fail to converge. To resolve this issue, we make an analogy between agents and humans. Agents can learn from demonstrations generated by traditional traffic signal control methods, in the similar way as people master a skill from expert knowledge. Therefore, we propose DemoLight, for the first time, to leverage demonstrations collected from classic methods to accelerate learning. Based on the state-of-the-art deep RL method Advantage Actor-Critic (A2C), training with demos are carried out for both the actor and the critic and reinforcement learning is followed for further improvement. Results under real-world datasets show that DemoLight enables a more efficient exploration and outperforms existing baselines with faster convergence and better performance.
UR - http://www.scopus.com/inward/record.url?scp=85075457757&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075457757&partnerID=8YFLogxK
U2 - 10.1145/3357384.3358079
DO - 10.1145/3357384.3358079
M3 - Conference contribution
AN - SCOPUS:85075457757
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2289
EP - 2292
BT - CIKM 2019 - Proceedings of the 28th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 28th ACM International Conference on Information and Knowledge Management, CIKM 2019
Y2 - 3 November 2019 through 7 November 2019
ER -