Multi Page Search with Reinforcement Learning to Rank

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI:10.1145/3234944.3234977

Wei Zeng, Jun Xu, Yanyan Lan, J. Guo, Xueqi Cheng

{"title":"Multi Page Search with Reinforcement Learning to Rank","authors":"Wei Zeng, Jun Xu, Yanyan Lan, J. Guo, Xueqi Cheng","doi":"10.1145/3234944.3234977","DOIUrl":null,"url":null,"abstract":"Web search engines are typically designed to involve multiple pages of search results, and the search users engaging in exploratory search with ad hoc queries are likely to access more than one result pages. The ranking of web pages for such queries should consider additional information other than the original query, e.g., the user clicks on previous result pages. Existing methods that utilize this kind of information usually involve relevance feedback, which uses the feedback information to explore the user's intent. However, due to the limitation of the feedback mechanism, it is difficult to apply existing relevance feedback techniques to state-of-the-art learning to rank models. In this paper, we propose a novel learning to rank model for multi page search in which the user's feedback can be naturally utilized for improving the ranking of next result page. The model, referred to as MDP-MPS, formalizes the ranking of documents in multi page search as a Markov decision process (MDP) in which the search engine corresponds to the agent for constructing the document rankings in the result pages, and the user corresponds to the environment for judging the rankings and providing rewards. The policy gradient algorithm of REINFORCE is adopted for learning the model parameters. Experimental results on OHSUMED dataset showed that our approach outperformed the baselines of traditional relevance ranking model of ListNet and relevance feedback method of Rocchio.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3234944.3234977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Web search engines are typically designed to involve multiple pages of search results, and the search users engaging in exploratory search with ad hoc queries are likely to access more than one result pages. The ranking of web pages for such queries should consider additional information other than the original query, e.g., the user clicks on previous result pages. Existing methods that utilize this kind of information usually involve relevance feedback, which uses the feedback information to explore the user's intent. However, due to the limitation of the feedback mechanism, it is difficult to apply existing relevance feedback techniques to state-of-the-art learning to rank models. In this paper, we propose a novel learning to rank model for multi page search in which the user's feedback can be naturally utilized for improving the ranking of next result page. The model, referred to as MDP-MPS, formalizes the ranking of documents in multi page search as a Markov decision process (MDP) in which the search engine corresponds to the agent for constructing the document rankings in the result pages, and the user corresponds to the environment for judging the rankings and providing rewards. The policy gradient algorithm of REINFORCE is adopted for learning the model parameters. Experimental results on OHSUMED dataset showed that our approach outperformed the baselines of traditional relevance ranking model of ListNet and relevance feedback method of Rocchio.

查看原文本刊更多论文

多页搜索与强化学习排名

Web搜索引擎通常被设计为包含多个搜索结果页面，使用特殊查询进行探索性搜索的搜索用户可能会访问多个结果页面。此类查询的网页排名应考虑原始查询以外的其他信息，例如，用户点击了先前的结果页面。利用这类信息的现有方法通常涉及相关性反馈，即利用反馈信息来探索用户的意图。然而，由于反馈机制的限制，现有的相关反馈技术很难应用于最先进的学习来对模型进行排序。在本文中，我们提出了一种新的多页面搜索学习排序模型，该模型可以自然地利用用户的反馈来提高下一个结果页面的排名。该模型称为MDP- mps，将多页面搜索中的文档排名形式化为马尔可夫决策过程(MDP)，其中搜索引擎对应于在结果页面中构建文档排名的代理，用户对应于判断排名并提供奖励的环境。采用强化的策略梯度算法学习模型参数。在OHSUMED数据集上的实验结果表明，该方法优于传统的ListNet相关排序模型和Rocchio相关反馈方法的基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval

自引率

0.00%

发文量