TY - JOUR
T1 - Federated reinforcement learning for robot motion planning with zero-shot generalization
AU - Yuan, Zhenyuan
AU - Xu, Siyuan
AU - Zhu, Minghui
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/8
Y1 - 2024/8
N2 - This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing their raw data. In each iteration, each learner uploads its local control policy and the corresponding estimated normalized arrival time to the Cloud, which then computes the global optimum among the learners and broadcasts the optimal policy to the learners. Each learner then selects between its local control policy and that from the Cloud for next iteration. The proposed framework leverages on the derived zero-shot generalization guarantees on arrival time and safety. Theoretical guarantees on almost-sure convergence, almost consensus, Pareto improvement and optimality gap are also provided. Monte Carlo simulation is conducted to evaluate the proposed framework.
AB - This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing their raw data. In each iteration, each learner uploads its local control policy and the corresponding estimated normalized arrival time to the Cloud, which then computes the global optimum among the learners and broadcasts the optimal policy to the learners. Each learner then selects between its local control policy and that from the Cloud for next iteration. The proposed framework leverages on the derived zero-shot generalization guarantees on arrival time and safety. Theoretical guarantees on almost-sure convergence, almost consensus, Pareto improvement and optimality gap are also provided. Monte Carlo simulation is conducted to evaluate the proposed framework.
UR - http://www.scopus.com/inward/record.url?scp=85194060891&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85194060891&partnerID=8YFLogxK
U2 - 10.1016/j.automatica.2024.111709
DO - 10.1016/j.automatica.2024.111709
M3 - Article
AN - SCOPUS:85194060891
SN - 0005-1098
VL - 166
JO - Automatica
JF - Automatica
M1 - 111709
ER -