TY - GEN
T1 - Data-driven schemes for resolving misspecified MDPs
T2 - Winter Simulation Conference, WSC 2015
AU - Jiang, Hao
AU - Shanbhag, Vinayak V.
PY - 2016/2/16
Y1 - 2016/2/16
N2 - We consider the solution of a finite-state infinite horizon Markov Decision Process (MDP) in which both the transition matrix and the cost function are misspecified, the latter in a parametric sense. We consider a data-driven regime in which the learning problem is a stochastic convex optimization problem that resolves misspecification. Via such a framework, we make the following contributions: (1) We first show that a misspecified value iteration scheme converges almost surely to its true counterpart and the mean-squared error after K iterations is O(1/K1/2-α) with 0 < α < 1/2; (2) An analogous asymptotic almost-sure convergence statement is provided for misspecified policy iteration; and (3) Finally, we present a constant steplength misspecified Q-learning scheme and show that a suitable error metric is O(1/K1/2-α)+O(√δ) with 0 < α < 1/2 after K iterations where δ is a bound on the steplength.
AB - We consider the solution of a finite-state infinite horizon Markov Decision Process (MDP) in which both the transition matrix and the cost function are misspecified, the latter in a parametric sense. We consider a data-driven regime in which the learning problem is a stochastic convex optimization problem that resolves misspecification. Via such a framework, we make the following contributions: (1) We first show that a misspecified value iteration scheme converges almost surely to its true counterpart and the mean-squared error after K iterations is O(1/K1/2-α) with 0 < α < 1/2; (2) An analogous asymptotic almost-sure convergence statement is provided for misspecified policy iteration; and (3) Finally, we present a constant steplength misspecified Q-learning scheme and show that a suitable error metric is O(1/K1/2-α)+O(√δ) with 0 < α < 1/2 after K iterations where δ is a bound on the steplength.
UR - http://www.scopus.com/inward/record.url?scp=84962903669&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84962903669&partnerID=8YFLogxK
U2 - 10.1109/WSC.2015.7408537
DO - 10.1109/WSC.2015.7408537
M3 - Conference contribution
AN - SCOPUS:84962903669
T3 - Proceedings - Winter Simulation Conference
SP - 3801
EP - 3812
BT - 2015 Winter Simulation Conference, WSC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 December 2015 through 9 December 2015
ER -