TY - GEN
T1 - Dynamic ensemble of contextual bandits to satisfy users' changing interests
AU - Wu, Qingyun
AU - Wang, Huazheng
AU - Li, Yanen
AU - Wang, Hongning
N1 - Funding Information:
This work was supported in part by National Science Foundation Grant IIS-1553568 and IIS-1618948, and Bloomberg Data Science PhD Fellowship.
Publisher Copyright:
© 2019 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC BY 4.0 License.
PY - 2019/5/13
Y1 - 2019/5/13
N2 - Recommender systems have to handle a highly non-stationary environment, due to users' fast changing interests over time. Traditional solutions have to periodically rebuild their models, despite high computational cost. But this still cannot empower them to automatically adjust to abrupt changes in trends caused by timely information. It is important to note that the changes of reward distributions caused by a non-stationary environment can also be context dependent. When the change is orthogonal to the given context, previously maintained models should be reused for better recommendation prediction. In this work, we focus on contextual bandit algorithms for making adaptive recommendations. We capitalize on the unique context-dependent property of reward changes to conquer the challenging non-stationary environment for model update. In particular, we maintain a dynamic ensemble of contextual bandit models, where each bandit model's reward estimation quality is monitored regarding given context and possible environment changes. Only the admissible models to the current environment will be used for recommendation. We provide a rigorous upper regret bound analysis of our proposed algorithm. Extensive empirical evaluations on both synthetic and three real-world datasets confirmed the algorithm's advantage against existing non-stationary solutions that simply create new models whenever an environment change is detected.
AB - Recommender systems have to handle a highly non-stationary environment, due to users' fast changing interests over time. Traditional solutions have to periodically rebuild their models, despite high computational cost. But this still cannot empower them to automatically adjust to abrupt changes in trends caused by timely information. It is important to note that the changes of reward distributions caused by a non-stationary environment can also be context dependent. When the change is orthogonal to the given context, previously maintained models should be reused for better recommendation prediction. In this work, we focus on contextual bandit algorithms for making adaptive recommendations. We capitalize on the unique context-dependent property of reward changes to conquer the challenging non-stationary environment for model update. In particular, we maintain a dynamic ensemble of contextual bandit models, where each bandit model's reward estimation quality is monitored regarding given context and possible environment changes. Only the admissible models to the current environment will be used for recommendation. We provide a rigorous upper regret bound analysis of our proposed algorithm. Extensive empirical evaluations on both synthetic and three real-world datasets confirmed the algorithm's advantage against existing non-stationary solutions that simply create new models whenever an environment change is detected.
UR - http://www.scopus.com/inward/record.url?scp=85066916130&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85066916130&partnerID=8YFLogxK
U2 - 10.1145/3308558.3313727
DO - 10.1145/3308558.3313727
M3 - Conference contribution
AN - SCOPUS:85066916130
T3 - The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019
SP - 2080
EP - 2090
BT - The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019
PB - Association for Computing Machinery, Inc
T2 - 2019 World Wide Web Conference, WWW 2019
Y2 - 13 May 2019 through 17 May 2019
ER -