Towards General Function Approximation in Nonstationary Reinforcement Learning

Songtao Feng, Ming Yin, Ruiquan Huang, Yu Xiang Wang, Jing Yang, Yingbin Liang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Function approximation has experienced significant success in the field of reinforcement learning (RL). Despite a handful of progress on developing theory for Nonstationary RL with function approximation under structural assumptions, existing work for nonstationary RL with general function approximation is still limited. In this work, we propose a UCB-type of algorithm LSVI-Nonstationary following the popular least-square-value-iteration (LSVI) framework. LSVI-Nonstationary features the restart mechanism and a new design of bonus term to handle nonstationarity, and performs no worse than the existing confidence-set based algorithm SW-OPEA in [1], which has been shown to outperform the existing algorithms for nonstationary linear and tabular MDPs in the small variation budget setting.

Original languageEnglish (US)
Title of host publication2024 IEEE International Symposium on Information Theory, ISIT 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9798350382846
DOIs
StatePublished - 2024
Event2024 IEEE International Symposium on Information Theory, ISIT 2024 - Athens, Greece
Duration: Jul 7 2024Jul 12 2024

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
ISSN (Print)2157-8095

Conference

Conference2024 IEEE International Symposium on Information Theory, ISIT 2024
Country/TerritoryGreece
CityAthens
Period7/7/247/12/24

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Information Systems
  • Modeling and Simulation
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Towards General Function Approximation in Nonstationary Reinforcement Learning'. Together they form a unique fingerprint.

Cite this