Stochastic Linear Contextual Bandits with Diverse Contexts

Weiqiang Wu, Jing Yang, Cong Shen

Research output: Contribution to journalConference articlepeer-review

8 Scopus citations

Abstract

In this paper, we investigate the impact of context diversity on stochastic linear contextual bandits. As opposed to the previous view that contexts lead to more difficult bandit learning, we show that when the contexts are sufficiently diverse, the learner is able to utilize the information obtained during exploitation to shorten the exploration process, thus achieving reduced regret. We design the LinUCB-d algorithm, and propose a novel approach to analyze its regret performance. The main theoretical result is that under the diverse context assumption, the cumulative expected regret of LinUCB-d is bounded by a constant. As a by-product, our results improve the previous understanding of LinUCB and strengthen its performance guarantee.

Original languageEnglish (US)
Pages (from-to)2392-2401
Number of pages10
JournalProceedings of Machine Learning Research
Volume108
StatePublished - 2020
Event23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 - Virtual, Online
Duration: Aug 26 2020Aug 28 2020

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Stochastic Linear Contextual Bandits with Diverse Contexts'. Together they form a unique fingerprint.

Cite this