TY - JOUR
T1 - Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
AU - Chen, Yi
AU - Wang, Yining
AU - Fang, Ethan X.
AU - Wang, Zhaoran
AU - Li, Runze
N1 - Publisher Copyright:
© 2022 American Statistical Association.
PY - 2024
Y1 - 2024
N2 - We consider the stochastic contextual bandit problem under the high dimensional linear model. We focus on the case where the action space is finite and random, with each action associated with a randomly generated contextual covariate. This setting finds essential applications such as personalized recommendations, online advertisements, and personalized medicine. However, it is very challenging to balance the exploration and exploitation tradeoff. We modify the LinUCB algorithm in doubly growing epochs and estimate the parameter using the best subset selection method, which is easy to implement in practice. This approach achieves (Formula presented.) regret with high probability, which is nearly independent of the “ambient” regression model dimension d. We further attain a sharper (Formula presented.) regret by using the SupLinUCB framework and match the minimax lower bound of the low-dimensional linear stochastic bandit problem. Finally, we conduct extensive numerical experiments to empirically demonstrate our algorithms’ applicability and robustness. Supplementary materials for this article are available online.
AB - We consider the stochastic contextual bandit problem under the high dimensional linear model. We focus on the case where the action space is finite and random, with each action associated with a randomly generated contextual covariate. This setting finds essential applications such as personalized recommendations, online advertisements, and personalized medicine. However, it is very challenging to balance the exploration and exploitation tradeoff. We modify the LinUCB algorithm in doubly growing epochs and estimate the parameter using the best subset selection method, which is easy to implement in practice. This approach achieves (Formula presented.) regret with high probability, which is nearly independent of the “ambient” regression model dimension d. We further attain a sharper (Formula presented.) regret by using the SupLinUCB framework and match the minimax lower bound of the low-dimensional linear stochastic bandit problem. Finally, we conduct extensive numerical experiments to empirically demonstrate our algorithms’ applicability and robustness. Supplementary materials for this article are available online.
UR - http://www.scopus.com/inward/record.url?scp=85139148493&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139148493&partnerID=8YFLogxK
U2 - 10.1080/01621459.2022.2108816
DO - 10.1080/01621459.2022.2108816
M3 - Article
AN - SCOPUS:85139148493
SN - 0162-1459
VL - 119
SP - 246
EP - 258
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 545
ER -