TY - GEN
T1 - Learning traffic signal control from demonstrations
AU - Xiong, Yuanhao
AU - Xu, Kai
AU - Zheng, Guanjie
AU - Li, Zhenhui
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/11/3
Y1 - 2019/11/3
N2 - Reinforcement learning (RL) has recently become a promising approach in various decision-making tasks. Among them, traffic signal control is the one where RL makes a great breakthrough. However, these methods always suffer from the prominent exploration problem and even fail to converge. To resolve this issue, we make an analogy between agents and humans. Agents can learn from demonstrations generated by traditional traffic signal control methods, in the similar way as people master a skill from expert knowledge. Therefore, we propose DemoLight, for the first time, to leverage demonstrations collected from classic methods to accelerate learning. Based on the state-of-the-art deep RL method Advantage Actor-Critic (A2C), training with demos are carried out for both the actor and the critic and reinforcement learning is followed for further improvement. Results under real-world datasets show that DemoLight enables a more efficient exploration and outperforms existing baselines with faster convergence and better performance.
AB - Reinforcement learning (RL) has recently become a promising approach in various decision-making tasks. Among them, traffic signal control is the one where RL makes a great breakthrough. However, these methods always suffer from the prominent exploration problem and even fail to converge. To resolve this issue, we make an analogy between agents and humans. Agents can learn from demonstrations generated by traditional traffic signal control methods, in the similar way as people master a skill from expert knowledge. Therefore, we propose DemoLight, for the first time, to leverage demonstrations collected from classic methods to accelerate learning. Based on the state-of-the-art deep RL method Advantage Actor-Critic (A2C), training with demos are carried out for both the actor and the critic and reinforcement learning is followed for further improvement. Results under real-world datasets show that DemoLight enables a more efficient exploration and outperforms existing baselines with faster convergence and better performance.
UR - http://www.scopus.com/inward/record.url?scp=85075457757&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075457757&partnerID=8YFLogxK
U2 - 10.1145/3357384.3358079
DO - 10.1145/3357384.3358079
M3 - Conference contribution
AN - SCOPUS:85075457757
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2289
EP - 2292
BT - CIKM 2019 - Proceedings of the 28th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 28th ACM International Conference on Information and Knowledge Management, CIKM 2019
Y2 - 3 November 2019 through 7 November 2019
ER -