Abstract
This paper considers the problem of inferring the behaviors of multiple interacting experts by estimating their reward functions and constraints where the distributed demonstrated trajectories are sequentially revealed to a group of learners. We formulate the problem as a distributed online bi-level optimization problem where the outer-level problem is to estimate the reward functions and the inner-level problem is to learn the constraints and corresponding policies. We propose a novel “multi-agent behavior inference from distributed and streaming demonstrations" (MA-BIRDS) algorithm that allows the learners to solve the outer-level and inner-level problems in a single loop through intermittent communications. We formally guarantee that the distributed learners achieve consensus on reward functions, constraints, and policies, the average local regret (over N online iterations) decreases at the rate of O(1/N1−η1 +1/N1−η2 +1/N), and the cumulative constraint violation increases sub-linearly at the rate of O(Nη2 + 1) where η1, η2 ∈ (1/2, 1).
Original language | English (US) |
---|---|
Pages (from-to) | 53552-53564 |
Number of pages | 13 |
Journal | Advances in Neural Information Processing Systems |
Volume | 36 |
State | Published - 2023 |
Event | 37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States Duration: Dec 10 2023 → Dec 16 2023 |
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems
- Signal Processing