Constrained differentiable cross-entropy method for safe model-based reinforcement learning

Sam Mottahedi, Gregory S. Pavlak

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Reinforcement learning agents must explore their environments to learn optimal policies through trial and error. Due to challenges in simulating the complexities of the real world, there is a growing trend of training reinforcement learning (RL) agents directly in the real world instead of mostly or entirely in simulation. Safety concerns are paramount when training RL agents directly in the real world. This paper proposes MPC-CDCEM, a model-based reinforcement algorithm (RL) that allows the agent to safely interact with the environment and explore without additional assumptions on system dynamics. The algorithm uses a Model Predictive Control (MPC) framework with a differentiable cross-entropy optimizer, which induces a differentiable policy that considers the constraints while addressing the objective mismatch problem in model-based RL algorithms. We evaluate our algorithm in Safety Gym environments and on a practical building energy optimization problem. In addition, we showed that in both experiments, our algorithms have the lowest number of constraint violations and achieve comparable rewards compared to baseline constrained RL algorithms.

Original languageEnglish (US)
Title of host publicationBuildSys 2022 - Proceedings of the 2022 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation
PublisherAssociation for Computing Machinery, Inc
Pages40-48
Number of pages9
ISBN (Electronic)9781450398909
DOIs
StatePublished - Nov 9 2022
Event9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys 2022 - Boston, United States
Duration: Nov 9 2022Nov 10 2022

Publication series

NameBuildSys 2022 - Proceedings of the 2022 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation

Conference

Conference9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys 2022
Country/TerritoryUnited States
CityBoston
Period11/9/2211/10/22

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Renewable Energy, Sustainability and the Environment
  • Electrical and Electronic Engineering
  • Architecture
  • Building and Construction

Fingerprint

Dive into the research topics of 'Constrained differentiable cross-entropy method for safe model-based reinforcement learning'. Together they form a unique fingerprint.

Cite this