Data-driven schemes for resolving misspecified MDPs: Asymptotics and error analysis

Hao Jiang, Uday V. Shanbhag

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

We consider the solution of a finite-state infinite horizon Markov Decision Process (MDP) in which both the transition matrix and the cost function are misspecified, the latter in a parametric sense. We consider a data-driven regime in which the learning problem is a stochastic convex optimization problem that resolves misspecification. Via such a framework, we make the following contributions: (1) We first show that a misspecified value iteration scheme converges almost surely to its true counterpart and the mean-squared error after K iterations is O(1/K1/2-α) with 0 < α < 1/2; (2) An analogous asymptotic almost-sure convergence statement is provided for misspecified policy iteration; and (3) Finally, we present a constant steplength misspecified Q-learning scheme and show that a suitable error metric is O(1/K1/2-α)+O(√δ) with 0 < α < 1/2 after K iterations where δ is a bound on the steplength.

Original languageEnglish (US)
Title of host publication2015 Winter Simulation Conference, WSC 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3801-3812
Number of pages12
ISBN (Electronic)9781467397438
DOIs
StatePublished - Feb 16 2016
EventWinter Simulation Conference, WSC 2015 - Huntington Beach, United States
Duration: Dec 6 2015Dec 9 2015

Publication series

NameProceedings - Winter Simulation Conference
Volume2016-February
ISSN (Print)0891-7736

Other

OtherWinter Simulation Conference, WSC 2015
Country/TerritoryUnited States
CityHuntington Beach
Period12/6/1512/9/15

All Science Journal Classification (ASJC) codes

  • Software
  • Modeling and Simulation
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Data-driven schemes for resolving misspecified MDPs: Asymptotics and error analysis'. Together they form a unique fingerprint.

Cite this