TY - GEN
T1 - Cascading Bandits with Two-Level Feedback
AU - Cheng, Duo
AU - Huang, Ruiquan
AU - Shen, Cong
AU - Yang, Jing
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Motivated by the engineering application of efficient mobility management in ultra-dense wireless networks, we propose a novel cost-aware cascading bandit model with two-level actions. Compared with the standard cascading bandit model with a single-level action, this new model captures the real-world action sequence in mobility management, where the base station not only decides on an ordered neighbor cell list before measurement, but also executes the final handover decision to the target base station. We first analyze the optimal offline policy when the arm statistics are known beforehand. An online learning algorithm coined two-level Cost-aware Cascading UCB (CC-UCB) is then proposed to exploit the structure of the optimal offline policy with estimated arm statistics. Theoretical analysis shows that the cumulative regret under two-level CC-UCB scales logarithmically in time, which coincides with the asymptotic lower bound, thus is order-optimal. Simulation results corroborate the theoretical results and validate the effectiveness of two-level CC-UCB for mobility management.
AB - Motivated by the engineering application of efficient mobility management in ultra-dense wireless networks, we propose a novel cost-aware cascading bandit model with two-level actions. Compared with the standard cascading bandit model with a single-level action, this new model captures the real-world action sequence in mobility management, where the base station not only decides on an ordered neighbor cell list before measurement, but also executes the final handover decision to the target base station. We first analyze the optimal offline policy when the arm statistics are known beforehand. An online learning algorithm coined two-level Cost-aware Cascading UCB (CC-UCB) is then proposed to exploit the structure of the optimal offline policy with estimated arm statistics. Theoretical analysis shows that the cumulative regret under two-level CC-UCB scales logarithmically in time, which coincides with the asymptotic lower bound, thus is order-optimal. Simulation results corroborate the theoretical results and validate the effectiveness of two-level CC-UCB for mobility management.
UR - http://www.scopus.com/inward/record.url?scp=85136314435&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136314435&partnerID=8YFLogxK
U2 - 10.1109/ISIT50566.2022.9834892
DO - 10.1109/ISIT50566.2022.9834892
M3 - Conference contribution
AN - SCOPUS:85136314435
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 1892
EP - 1896
BT - 2022 IEEE International Symposium on Information Theory, ISIT 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Symposium on Information Theory, ISIT 2022
Y2 - 26 June 2022 through 1 July 2022
ER -