SeaRank: relevance prediction based on click models in a reinforcement learning framework

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications Pub Date : 2022-09-01 DOI:10.1108/dta-01-2022-0001

A. Keyhanipour, F. Oroumchian

{"title":"SeaRank: relevance prediction based on click models in a reinforcement learning framework","authors":"A. Keyhanipour, F. Oroumchian","doi":"10.1108/dta-01-2022-0001","DOIUrl":null,"url":null,"abstract":"PurposeUser feedback inferred from the user's search-time behavior could improve the learning to rank (L2R) algorithms. Click models (CMs) present probabilistic frameworks for describing and predicting the user's clicks during search sessions. Most of these CMs are based on common assumptions such as Attractiveness, Examination and User Satisfaction. CMs usually consider the Attractiveness and Examination as pre- and post-estimators of the actual relevance. They also assume that User Satisfaction is a function of the actual relevance. This paper extends the authors' previous work by building a reinforcement learning (RL) model to predict the relevance. The Attractiveness, Examination and User Satisfaction are estimated using a limited number of the features of the utilized benchmark data set and then they are incorporated in the construction of an RL agent. The proposed RL model learns to predict the relevance label of documents with respect to a given query more effectively than the baseline RL models for those data sets.Design/methodology/approachIn this paper, User Satisfaction is used as an indication of the relevance level of a query to a document. User Satisfaction itself is estimated through Attractiveness and Examination, and in turn, Attractiveness and Examination are calculated by the random forest algorithm. In this process, only a small subset of top information retrieval (IR) features are used, which are selected based on their mean average precision and normalized discounted cumulative gain values. Based on the authors' observations, the multiplication of the Attractiveness and Examination values of a given query–document pair closely approximates the User Satisfaction and hence the relevance level. Besides, an RL model is designed in such a way that the current state of the RL agent is determined by discretization of the estimated Attractiveness and Examination values. In this way, each query–document pair would be mapped into a specific state based on its Attractiveness and Examination values. Then, based on the reward function, the RL agent would try to choose an action (relevance label) which maximizes the received reward in its current state. Using temporal difference (TD) learning algorithms, such as Q-learning and SARSA, the learning agent gradually learns to identify an appropriate relevance label in each state. The reward that is used in the RL agent is proportional to the difference between the User Satisfaction and the selected action.FindingsExperimental results on MSLR-WEB10K and WCL2R benchmark data sets demonstrate that the proposed algorithm, named as SeaRank, outperforms baseline algorithms. Improvement is more noticeable in top-ranked results, which usually receive more attention from users.Originality/valueThis research provides a mapping from IR features to the CM features and thereafter utilizes these newly generated features to build an RL model. This RL model is proposed with the definition of the states, actions and reward function. By applying TD learning algorithms, such as the Q-learning and SARSA, within several learning episodes, the RL agent would be able to learn how to choose the most appropriate relevance label for a given pair of query–document.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-01-2022-0001","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

PurposeUser feedback inferred from the user's search-time behavior could improve the learning to rank (L2R) algorithms. Click models (CMs) present probabilistic frameworks for describing and predicting the user's clicks during search sessions. Most of these CMs are based on common assumptions such as Attractiveness, Examination and User Satisfaction. CMs usually consider the Attractiveness and Examination as pre- and post-estimators of the actual relevance. They also assume that User Satisfaction is a function of the actual relevance. This paper extends the authors' previous work by building a reinforcement learning (RL) model to predict the relevance. The Attractiveness, Examination and User Satisfaction are estimated using a limited number of the features of the utilized benchmark data set and then they are incorporated in the construction of an RL agent. The proposed RL model learns to predict the relevance label of documents with respect to a given query more effectively than the baseline RL models for those data sets.Design/methodology/approachIn this paper, User Satisfaction is used as an indication of the relevance level of a query to a document. User Satisfaction itself is estimated through Attractiveness and Examination, and in turn, Attractiveness and Examination are calculated by the random forest algorithm. In this process, only a small subset of top information retrieval (IR) features are used, which are selected based on their mean average precision and normalized discounted cumulative gain values. Based on the authors' observations, the multiplication of the Attractiveness and Examination values of a given query–document pair closely approximates the User Satisfaction and hence the relevance level. Besides, an RL model is designed in such a way that the current state of the RL agent is determined by discretization of the estimated Attractiveness and Examination values. In this way, each query–document pair would be mapped into a specific state based on its Attractiveness and Examination values. Then, based on the reward function, the RL agent would try to choose an action (relevance label) which maximizes the received reward in its current state. Using temporal difference (TD) learning algorithms, such as Q-learning and SARSA, the learning agent gradually learns to identify an appropriate relevance label in each state. The reward that is used in the RL agent is proportional to the difference between the User Satisfaction and the selected action.FindingsExperimental results on MSLR-WEB10K and WCL2R benchmark data sets demonstrate that the proposed algorithm, named as SeaRank, outperforms baseline algorithms. Improvement is more noticeable in top-ranked results, which usually receive more attention from users.Originality/valueThis research provides a mapping from IR features to the CM features and thereafter utilizes these newly generated features to build an RL model. This RL model is proposed with the definition of the states, actions and reward function. By applying TD learning algorithms, such as the Q-learning and SARSA, within several learning episodes, the RL agent would be able to learn how to choose the most appropriate relevance label for a given pair of query–document.

查看原文本刊更多论文

SeaRank:强化学习框架中基于点击模型的相关性预测

目的从用户的搜索时间行为推断出的用户反馈可以改进学习排序（L2R）算法。点击模型（CM）提供了用于描述和预测用户在搜索会话期间的点击的概率框架。这些CM大多基于常见的假设，如吸引力、考试和用户满意度。CM通常将吸引力和检验视为实际相关性的前估计量和后估计量。他们还假设用户满意度是实际相关性的函数。本文通过建立强化学习（RL）模型来预测相关性，扩展了作者以前的工作。使用所使用的基准数据集的有限数量的特征来估计吸引力、检查和用户满意度，然后将它们纳入RL代理的构建中。所提出的RL模型比那些数据集的基线RL模型更有效地学习预测文档相对于给定查询的相关性标签。设计/方法论/方法在本文中，用户满意度被用作查询与文档的相关性水平的指示。用户满意度本身是通过吸引力和检查来估计的，而吸引力和检查又是通过随机森林算法来计算的。在这个过程中，只使用顶部信息检索（IR）特征的一小部分，这些特征是基于它们的平均精度和归一化的贴现累积增益值来选择的。根据作者的观察，给定查询-文档对的吸引力和检查值的乘积非常接近用户满意度，从而接近相关性水平。此外，RL模型是以这样的方式设计的，即RL代理的当前状态是通过估计的吸引力和检查值的离散化来确定的。通过这种方式，每个查询-文档对将根据其吸引力和检查值映射到特定状态。然后，基于奖励函数，RL代理将尝试选择在其当前状态下使所接收的奖励最大化的动作（相关性标签）。使用时间差（TD）学习算法，如Q学习和SARSA，学习代理逐渐学会在每个状态中识别适当的相关性标签。RL代理中使用的奖励与用户满意度和所选动作之间的差异成比例。在MSLR-WEB10K和WCL2R基准数据集上的实验结果表明，所提出的算法SeaRank优于基线算法。排名靠前的结果的改善更为明显，通常会受到用户的更多关注。原创性/价值这项研究提供了从IR特征到CM特征的映射，然后利用这些新生成的特征来构建RL模型。提出了RL模型，定义了状态、行为和奖励函数。通过在几个学习事件中应用TD学习算法，如Q学习和SARSA，RL代理将能够学习如何为给定的查询-文档对选择最合适的相关性标签。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data Technologies and Applications Social Sciences-Library and Information Sciences

CiteScore

3.80

自引率

6.20%

发文量

期刊介绍： Previously published as: Program Online from: 2018 Subject Area: Information & Knowledge Management, Library Studies