Towards Off-Policy Learning for Ranking Policies with Logged Feedback

Teng Xiao, Suhang Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Probabilistic learning to rank (LTR) has been the dominating approach for optimizing the ranking metric, but cannot maximize long-term rewards. Reinforcement learning models have been proposed to maximize user long-term rewards by formulating the recommendation as a sequential decision-making problem, but could only achieve inferior accuracy compared to LTR counterparts, primarily due to the lack of online interactions and the characteristics of ranking. In this paper, we propose a new off-policy value ranking (VR) algorithm that can simultaneously maximize user long-term rewards and optimize the ranking metric offline for improved sample efficiency in a unified Expectation-Maximization (EM) framework. We theoretically and empirically show that the EM process guides the leaned policy to enjoy the benefit of integration of the future reward and ranking metric, and learn without any online interactions. Extensive offline and online experiments demonstrate the effectiveness of our methods.

Original languageEnglish (US)
Title of host publicationAAAI-22 Technical Tracks 8
PublisherAssociation for the Advancement of Artificial Intelligence
Pages8700-8707
Number of pages8
ISBN (Electronic)1577358767, 9781577358763
DOIs
StatePublished - Jun 30 2022
Event36th AAAI Conference on Artificial Intelligence, AAAI 2022 - Virtual, Online
Duration: Feb 22 2022Mar 1 2022

Publication series

NameProceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
Volume36

Conference

Conference36th AAAI Conference on Artificial Intelligence, AAAI 2022
CityVirtual, Online
Period2/22/223/1/22

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Towards Off-Policy Learning for Ranking Policies with Logged Feedback'. Together they form a unique fingerprint.

Cite this