TY - GEN
T1 - Control regularization for reduced variance reinforcement learning
AU - Cheng, Richard
AU - Verma, Abhinav
AU - Orosz, Gábor
AU - Chaudhuri, Swarat
AU - Yue, Yisong
AU - Burdick, Joel W.
N1 - Publisher Copyright:
© 2019 by the Author(S).
PY - 2019
Y1 - 2019
N2 - Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.
AB - Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.
UR - http://www.scopus.com/inward/record.url?scp=85078056317&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078056317&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85078056317
T3 - 36th International Conference on Machine Learning, ICML 2019
SP - 1940
EP - 1949
BT - 36th International Conference on Machine Learning, ICML 2019
PB - International Machine Learning Society (IMLS)
T2 - 36th International Conference on Machine Learning, ICML 2019
Y2 - 9 June 2019 through 15 June 2019
ER -