Abstract
Reinforcement learning agents must explore their environments to learn optimal policies through trial and error. Due to challenges in simulating the complexities of the real world, there is a growing trend of training reinforcement learning (RL) agents directly in the real world instead of mostly or entirely in simulation. Safety concerns are paramount when training RL agents directly in the real world. This paper proposes MPC-CDCEM, a model-based reinforcement algorithm (RL) that allows the agent to safely interact with the environment and explore without additional assumptions on system dynamics. The algorithm uses a Model Predictive Control (MPC) framework with a differentiable cross-entropy optimizer, which induces a differentiable policy that considers the constraints while addressing the objective mismatch problem in model-based RL algorithms. We evaluate our algorithm in Safety Gym environments and on a practical building energy optimization problem. In addition, we showed that in both experiments, our algorithms have the lowest number of constraint violations and achieve comparable rewards compared to baseline constrained RL algorithms.
| Original language | English (US) |
|---|---|
| Title of host publication | BuildSys 2022 - Proceedings of the 2022 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 40-48 |
| Number of pages | 9 |
| ISBN (Electronic) | 9781450398909 |
| DOIs | |
| State | Published - Nov 9 2022 |
| Event | 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys 2022 - Boston, United States Duration: Nov 9 2022 → Nov 10 2022 |
Publication series
| Name | BuildSys 2022 - Proceedings of the 2022 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation |
|---|
Conference
| Conference | 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys 2022 |
|---|---|
| Country/Territory | United States |
| City | Boston |
| Period | 11/9/22 → 11/10/22 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems
- Renewable Energy, Sustainability and the Environment
- Electrical and Electronic Engineering
- Architecture
- Building and Construction
Fingerprint
Dive into the research topics of 'Constrained differentiable cross-entropy method for safe model-based reinforcement learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver