Cascading Bandits with Two-Level Feedback

Duo Cheng, Ruiquan Huang, Cong Shen, Jing Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Motivated by the engineering application of efficient mobility management in ultra-dense wireless networks, we propose a novel cost-aware cascading bandit model with two-level actions. Compared with the standard cascading bandit model with a single-level action, this new model captures the real-world action sequence in mobility management, where the base station not only decides on an ordered neighbor cell list before measurement, but also executes the final handover decision to the target base station. We first analyze the optimal offline policy when the arm statistics are known beforehand. An online learning algorithm coined two-level Cost-aware Cascading UCB (CC-UCB) is then proposed to exploit the structure of the optimal offline policy with estimated arm statistics. Theoretical analysis shows that the cumulative regret under two-level CC-UCB scales logarithmically in time, which coincides with the asymptotic lower bound, thus is order-optimal. Simulation results corroborate the theoretical results and validate the effectiveness of two-level CC-UCB for mobility management.

Original languageEnglish (US)
Title of host publication2022 IEEE International Symposium on Information Theory, ISIT 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages5
ISBN (Electronic)9781665421591
StatePublished - 2022
Event2022 IEEE International Symposium on Information Theory, ISIT 2022 - Espoo, Finland
Duration: Jun 26 2022Jul 1 2022

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
ISSN (Print)2157-8095


Conference2022 IEEE International Symposium on Information Theory, ISIT 2022

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Information Systems
  • Modeling and Simulation
  • Applied Mathematics


Dive into the research topics of 'Cascading Bandits with Two-Level Feedback'. Together they form a unique fingerprint.

Cite this