TY - GEN
T1 - Index-aware reinforcement learning for adaptive video streaming at the wireless edge
AU - Xiong, Guojun
AU - Qin, Xudong
AU - Li, Bin
AU - Singh, Rahul
AU - Li, Jian
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/3
Y1 - 2022/10/3
N2 - We study adaptive video streaming for multiple users in wireless access edge networks with unreliable channels. The key challenge is to jointly optimize the video bitrate adaptation and resource allocation such that the users' cumulative quality of experience is maximized. This problem is a finite-horizon restless multi-armed multi-action bandit problem and is provably hard to solve. To overcome this challenge, we propose a computationally appealing index policy entitled Quality Index Policy, which is well-defined without the Whittle indexability condition and is provably asymptotically optimal without the global attractor condition. These two conditions are widely needed in the design of most existing index policies, which are difficult to establish in general. Since the wireless access edge network environment is highly dynamic with system parameters unknown and time-varying, we further develop an index-aware reinforcement learning (RL) algorithm dubbed QA-UCB. We show that QA-UCB achieves a sub-linear regret with a low-complexity since it fully exploits the structure of the Quality Index Policy for making decisions. Extensive simulations using real-world traces demonstrate significant gains of proposed policies over conventional approaches. We note that the proposed framework for designing index policy and index-aware RL algorithm is of independent interest and could be useful for other large-scale multi-user problems.
AB - We study adaptive video streaming for multiple users in wireless access edge networks with unreliable channels. The key challenge is to jointly optimize the video bitrate adaptation and resource allocation such that the users' cumulative quality of experience is maximized. This problem is a finite-horizon restless multi-armed multi-action bandit problem and is provably hard to solve. To overcome this challenge, we propose a computationally appealing index policy entitled Quality Index Policy, which is well-defined without the Whittle indexability condition and is provably asymptotically optimal without the global attractor condition. These two conditions are widely needed in the design of most existing index policies, which are difficult to establish in general. Since the wireless access edge network environment is highly dynamic with system parameters unknown and time-varying, we further develop an index-aware reinforcement learning (RL) algorithm dubbed QA-UCB. We show that QA-UCB achieves a sub-linear regret with a low-complexity since it fully exploits the structure of the Quality Index Policy for making decisions. Extensive simulations using real-world traces demonstrate significant gains of proposed policies over conventional approaches. We note that the proposed framework for designing index policy and index-aware RL algorithm is of independent interest and could be useful for other large-scale multi-user problems.
UR - http://www.scopus.com/inward/record.url?scp=85139619693&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139619693&partnerID=8YFLogxK
U2 - 10.1145/3492866.3549726
DO - 10.1145/3492866.3549726
M3 - Conference contribution
AN - SCOPUS:85139619693
T3 - Proceedings of the International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc)
SP - 81
EP - 90
BT - MobiHoc 2022 - Proceedings of the 2022 23rd International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
PB - Association for Computing Machinery
T2 - 23rd ACM International Symposium on Mobile Ad Hoc Networking and Computing, MobiHoc 2022
Y2 - 17 October 2022 through 20 October 2022
ER -