Continuous Time Reinforcement Learning using Rough Paths

Project: Research project

Project Details


Reinforcement learning (RL) methods have been embraced, in both academic and industrial settings, for solving a range of science and engineering problems involving dynamic system optimization. These problems run the gamut and include optimal resource allocation in, for example, ride-sharing, healthcare management and energy systems, pricing and trading risky assets in finance, autonomous vehicles and robots. RL is also increasingly being used in the physical sciences, for instance for discovering new materials and/or exploring the properties of known materials. Despite the growing importance and breadth of applications, RL methods are often found to perform poorly in actuality. RL methods have been primarily developed for discrete-time and so-called Markovian settings, while most real-world problems are better modeled in continuous time, with non-Markovian dynamics. The broad adoption of RL methods across science and engineering necessitates the investigation of how to develop RL methods for continuous-time and non-Markovian settings. This project aims at laying the foundations for addressing these questions. In addition, the PIs have developed specific aims in terms of dissemination of discoveries, survey for graduate students, national and international networking, mentoring of junior researchers as well as graduate and undergraduate students, participation and organization of events, and interdisciplinary research.

The successful completion of this project will fill make significant contributions towards the theoretical analysis of continuous-time RL. The project will offer a global framework valid for general random environments. In particular it goes beyond the somewhat restrictive Markov setting, and allows for pathwise controls. At its core, this research project aims at the development of analytical results that can be used to provide theoretical guarantees for continuous-time RL problems across a range of application domains. The successful completion of the project will entail a number of new results characterizing the solution of pathwise optimal control in rough environments, the analysis of computational methods for obtaining optimal policies, as well as the analysis of numerical schemes for approximating policies and value functions using rough path signatures. The proposed efforts will have sufficient novelty to open new research areas. They will also further promote the applicability of the theoretical techniques alluded to above.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Effective start/end date8/1/227/31/25


  • National Science Foundation: $655,634.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.