TY - GEN
T1 - Backlogged Bandits
T2 - 43rd IEEE Conference on Computer Communications, INFOCOM 2024
AU - Steiger, Juaren
AU - Li, Bin
AU - Lu, Ning
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Bipartite queueing networks with unknown statistics, where jobs are routed to and queued at servers and yield server-dependent utilities upon completion, model a wide range of problems in communications and related research areas (e.g., call routing in call centers, task assignment in crowdsourcing, job dispatching to cloud servers). The utility maximization problem in bipartite queueing networks with unknown statistics is a bandit learning problem where the delayed semi-bandit feedback depends on the server queueing delay. In this paper, we propose an efficient algorithm that overcomes the technical shortcomings of the state-of-the-art and achieves square root regret, queue length, and feedback delay. Our approach also accommodates additional constraints, such as quality of service, fairness, and budgeted cost constraints, with constant expected peak violation and zero expected violation after a fixed timeslot. Empirically, our algorithm's regret is competitive with the state-of-the-art for some problem instances and outperforms it in others, with much lower delay and constraint violation.
AB - Bipartite queueing networks with unknown statistics, where jobs are routed to and queued at servers and yield server-dependent utilities upon completion, model a wide range of problems in communications and related research areas (e.g., call routing in call centers, task assignment in crowdsourcing, job dispatching to cloud servers). The utility maximization problem in bipartite queueing networks with unknown statistics is a bandit learning problem where the delayed semi-bandit feedback depends on the server queueing delay. In this paper, we propose an efficient algorithm that overcomes the technical shortcomings of the state-of-the-art and achieves square root regret, queue length, and feedback delay. Our approach also accommodates additional constraints, such as quality of service, fairness, and budgeted cost constraints, with constant expected peak violation and zero expected violation after a fixed timeslot. Empirically, our algorithm's regret is competitive with the state-of-the-art for some problem instances and outperforms it in others, with much lower delay and constraint violation.
UR - https://www.scopus.com/pages/publications/85201805535
UR - https://www.scopus.com/pages/publications/85201805535#tab=citedBy
U2 - 10.1109/INFOCOM52122.2024.10621295
DO - 10.1109/INFOCOM52122.2024.10621295
M3 - Conference contribution
AN - SCOPUS:85201805535
T3 - Proceedings - IEEE INFOCOM
SP - 381
EP - 390
BT - IEEE INFOCOM 2024 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 May 2024 through 23 May 2024
ER -