TY - GEN
T1 - On the Low-Complexity of Fair Learning for Combinatorial Multi-Armed Bandit
AU - Wu, Xiaoyi
AU - Ji, Bo
AU - Li, Bin
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Combinatorial Multi-Armed Bandit with fairness constraints is a framework where multiple arms form a super arm and can be pulled in each round under uncertainty to maximize cumulative rewards while ensuring the minimum average reward required by each arm. The existing pessimistic-optimistic algorithm linearly combines virtual queue-lengths (tracking the fairness violations) and Upper Confidence Bound estimates as a weight for each arm and selects a super arm with the maximum total weight. The number of super arms could be exponential in the number of arms in many scenarios. In wireless networks, due to interference constraints the number of super arms can grow exponentially with the number of arms. Evaluating all the feasible super arms to find the one with the maximum total weight can incur extremely high computational complexity in the pessimistic-optimistic algorithm. To tackle this issue, we develop a low-complexity fair learning algorithm based on the so-called pick-and-compare approach that involves randomly picking M feasible super arms to evaluate. By setting M to a constant, the number of comparison steps in the pessimistic-optimistic algorithm can be reduced to a constant, thereby significantly reducing the computational complexity. The theoretical analysis shows that our low-complexity design sacrifices fairness and regret performance only marginally. Finally, we validate our theoretical results through extensive simulations.
AB - Combinatorial Multi-Armed Bandit with fairness constraints is a framework where multiple arms form a super arm and can be pulled in each round under uncertainty to maximize cumulative rewards while ensuring the minimum average reward required by each arm. The existing pessimistic-optimistic algorithm linearly combines virtual queue-lengths (tracking the fairness violations) and Upper Confidence Bound estimates as a weight for each arm and selects a super arm with the maximum total weight. The number of super arms could be exponential in the number of arms in many scenarios. In wireless networks, due to interference constraints the number of super arms can grow exponentially with the number of arms. Evaluating all the feasible super arms to find the one with the maximum total weight can incur extremely high computational complexity in the pessimistic-optimistic algorithm. To tackle this issue, we develop a low-complexity fair learning algorithm based on the so-called pick-and-compare approach that involves randomly picking M feasible super arms to evaluate. By setting M to a constant, the number of comparison steps in the pessimistic-optimistic algorithm can be reduced to a constant, thereby significantly reducing the computational complexity. The theoretical analysis shows that our low-complexity design sacrifices fairness and regret performance only marginally. Finally, we validate our theoretical results through extensive simulations.
UR - https://www.scopus.com/pages/publications/105011057034
UR - https://www.scopus.com/pages/publications/105011057034#tab=citedBy
U2 - 10.1109/INFOCOM55648.2025.11044728
DO - 10.1109/INFOCOM55648.2025.11044728
M3 - Conference contribution
AN - SCOPUS:105011057034
T3 - Proceedings - IEEE INFOCOM
BT - INFOCOM 2025 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE Conference on Computer Communications, INFOCOM 2025
Y2 - 19 May 2025 through 22 May 2025
ER -