Learning from Delayed Semi-Bandit Feedback under Strong Fairness Guarantees

Juaren Steiger, Bin Li, Ning Lu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

Multi-armed bandit frameworks, including combinatorial semi-bandits and sleeping bandits, are commonly employed to model problems in communication networks and other engineering domains. In such problems, feedback to the learning agent is often delayed (e.g. communication delays in a wireless network or conversion delays in online advertising). Moreover, arms in a bandit problem often represent entities required to be treated fairly, i.e. the arms should be played at least a required fraction of the time. In contrast to the previously studied asymptotic fairness, many real-time systems require such fairness guarantees to hold even in the short-term (e.g. ensuring the credibility of information flows in an industrial Internet of Things (IoT) system). To that end, we develop the Learning with Delays under Fairness (LDF) algorithm to solve combinatorial semi-bandit problems with sleeping arms and delayed feedback, which we prove guarantees strong (short-term) fairness. While previous theoretical work on bandit problems with delayed feedback typically derive instance-dependent regret bounds, this approach proves to be challenging when simultaneously considering fairness. We instead derive a novel instance-independent regret bound in this setting which agrees with state-of-the-art bounds. We verify our theoretical results with extensive simulations using both synthetic and real-world datasets.

Original languageEnglish (US)
Title of host publicationINFOCOM 2022 - IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1379-1388
Number of pages10
ISBN (Electronic)9781665458221
DOIs
StatePublished - 2022
Event41st IEEE Conference on Computer Communications, INFOCOM 2022 - Virtual, Online, United Kingdom
Duration: May 2 2022May 5 2022

Publication series

NameProceedings - IEEE INFOCOM
Volume2022-May
ISSN (Print)0743-166X

Conference

Conference41st IEEE Conference on Computer Communications, INFOCOM 2022
Country/TerritoryUnited Kingdom
CityVirtual, Online
Period5/2/225/5/22

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Learning from Delayed Semi-Bandit Feedback under Strong Fairness Guarantees'. Together they form a unique fingerprint.

Cite this