T1 - Data-driven schemes for resolving misspecified MDPs

T2 - Winter Simulation Conference, WSC 2015

AU - Jiang, Hao

AU - Shanbhag, Vinayak V.

PY - 2016/2/16

Y1 - 2016/2/16

N2 - We consider the solution of a finite-state infinite horizon Markov Decision Process (MDP) in which both the transition matrix and the cost function are misspecified, the latter in a parametric sense. We consider a data-driven regime in which the learning problem is a stochastic convex optimization problem that resolves misspecification. Via such a framework, we make the following contributions: (1) We first show that a misspecified value iteration scheme converges almost surely to its true counterpart and the mean-squared error after K iterations is O(1/K1/2-α) with 0 < α < 1/2; (2) An analogous asymptotic almost-sure convergence statement is provided for misspecified policy iteration; and (3) Finally, we present a constant steplength misspecified Q-learning scheme and show that a suitable error metric is O(1/K1/2-α)+O(√δ) with 0 < α < 1/2 after K iterations where δ is a bound on the steplength.

T3 - Proceedings - Winter Simulation Conference

SP - 3801

EP - 3812

BT - 2015 Winter Simulation Conference, WSC 2015

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 6 December 2015 through 9 December 2015

