Programmatically interpretable reinforcement learning

Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, Swarat Chaudhuri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

75 Scopus citations

Abstract

We present a reinforcement learning framework, called Programmatically Interpretable Reinforcement Learning (PlRL), that is designed to generate interpretable and verifiable agent policies. Unlike the popular Deep Reinforcement Learning (Drl) paradigm, which represents policies by neural networks, PlRL represents policies using a high-level, domain-specific programming language. Such programmatic policies have the benefits of being more easily interpreted than neural networks, and being amenable to verification by symbolic methods. We propose a new method, called Neurally Directed Program Search (Ndps), for solving the challenging nonsmooth optimization problem of finding a programmatic policy with maximal reward. Ndps works by first learning a neural policy network using Drl, and then performing a local search over programmatic policies that seeks to minimize a distance from this neural "oracle". We evaluate NDPS on the task of learning to drive a simulated car in the TORCS car-racing environment. We demonstrate that Ndps is able to discover human-readable policies that pass some significant performance bars. We also show that PlRL policies can have smoother trajectories, and can be more easily transferred to environments not encountered during training, than corresponding policies discovered by Drl.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsAndreas Krause, Jennifer Dy
PublisherInternational Machine Learning Society (IMLS)
Pages8024-8033
Number of pages10
ISBN (Electronic)9781510867963
StatePublished - 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume11

Other

Other35th International Conference on Machine Learning, ICML 2018
Country/TerritorySweden
CityStockholm
Period7/10/187/15/18

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Programmatically interpretable reinforcement learning'. Together they form a unique fingerprint.

Cite this