Learning traffic signal control from demonstrations

Yuanhao Xiong, Kai Xu, Guanjie Zheng, Zhenhui Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

23 Scopus citations


Reinforcement learning (RL) has recently become a promising approach in various decision-making tasks. Among them, traffic signal control is the one where RL makes a great breakthrough. However, these methods always suffer from the prominent exploration problem and even fail to converge. To resolve this issue, we make an analogy between agents and humans. Agents can learn from demonstrations generated by traditional traffic signal control methods, in the similar way as people master a skill from expert knowledge. Therefore, we propose DemoLight, for the first time, to leverage demonstrations collected from classic methods to accelerate learning. Based on the state-of-the-art deep RL method Advantage Actor-Critic (A2C), training with demos are carried out for both the actor and the critic and reinforcement learning is followed for further improvement. Results under real-world datasets show that DemoLight enables a more efficient exploration and outperforms existing baselines with faster convergence and better performance.

Original languageEnglish (US)
Title of host publicationCIKM 2019 - Proceedings of the 28th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Number of pages4
ISBN (Electronic)9781450369763
StatePublished - Nov 3 2019
Event28th ACM International Conference on Information and Knowledge Management, CIKM 2019 - Beijing, China
Duration: Nov 3 2019Nov 7 2019

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings


Conference28th ACM International Conference on Information and Knowledge Management, CIKM 2019

All Science Journal Classification (ASJC) codes

  • Decision Sciences(all)
  • Business, Management and Accounting(all)


Dive into the research topics of 'Learning traffic signal control from demonstrations'. Together they form a unique fingerprint.

Cite this