Learning hidden features for contextual bandits

Huazheng Wang, Qingyun Wu, Hongning Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

73 Scopus citations

Abstract

Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offs between exploration and exploitation with companion side-information. Most contextual bandit algorithms simply assume the learner would have access to the entire set of features, which govern the generation of payoffs from a user to an item. However, in practice it is challenging to exhaust all relevant features ahead of time, and oftentimes due to privacy or sampling constraints many factors are unobservable to the algorithm. Failing to model such hidden factors leads a system to make constantly suboptimal predictions. In this paper, we propose to learn the hidden features for contextual bandit algorithms. Hidden features are explicitly introduced in our reward generation assumption, in addition to the observable contextual features. A scalable bandit algorithm is achieved via coordinate descent, in which closed form solutions exist at each iteration for both hidden features and bandit parameters. Most importantly, we rigorously prove that the developed contextual bandit algorithm achieves a sublinear upper regret bound with high probability, and a linear regret is inevitable if one fails to model such hidden features. Extensive experimentation on both simulations and large-scale real-world datasets verified the advantages of the proposed algorithm compared with several state-of-the-art contextual bandit algorithms and existing ad-hoc combinations between bandit algorithms and matrix factorization methods.

Original languageEnglish (US)
Title of host publicationCIKM 2016 - Proceedings of the 2016 ACM Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages1633-1642
Number of pages10
ISBN (Electronic)9781450340731
DOIs
StatePublished - Oct 24 2016
Event25th ACM International Conference on Information and Knowledge Management, CIKM 2016 - Indianapolis, United States
Duration: Oct 24 2016Oct 28 2016

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
Volume24-28-October-2016

Other

Other25th ACM International Conference on Information and Knowledge Management, CIKM 2016
Country/TerritoryUnited States
CityIndianapolis
Period10/24/1610/28/16

All Science Journal Classification (ASJC) codes

  • General Business, Management and Accounting
  • General Decision Sciences

Fingerprint

Dive into the research topics of 'Learning hidden features for contextual bandits'. Together they form a unique fingerprint.

Cite this