User model-based metrics for offline query suggestion evaluation

E. Kharitonov, C. Macdonald, P. Serdyukov, I. Ounis
{"title":"User model-based metrics for offline query suggestion evaluation","authors":"E. Kharitonov, C. Macdonald, P. Serdyukov, I. Ounis","doi":"10.1145/2484028.2484041","DOIUrl":null,"url":null,"abstract":"Query suggestion or auto-completion mechanisms are widely used by search engines and are increasingly attracting interest from the research community. However, the lack of commonly accepted evaluation methodology and metrics means that it is not possible to compare results and approaches from the literature. Moreover, often the metrics used to evaluate query suggestions tend to be an adaptation from other domains without a proper justification. Hence, it is not necessarily clear if the improvements reported in the literature would result in an actual improvement in the users' experience. Inspired by the cascade user models and state-of-the-art evaluation metrics in the web search domain, we address the query suggestion evaluation, by first studying the users behaviour from a search engine's query log and thereby deriving a new family of user models describing the users interaction with a query suggestion mechanism. Next, assuming a query log-based evaluation approach, we propose two new metrics to evaluate query suggestions, pSaved and eSaved. Both metrics are parameterised by a user model. pSaved is defined as the probability of using the query suggestions while submitting a query. eSaved equates to the expected relative amount of effort (keypresses) a user can avoid due to the deployed query suggestion mechanism. Finally, we experiment with both metrics using four user model instantiations as well as metrics previously used in the literature on a dataset of 6.1M sessions. Our results demonstrate that pSaved and eSaved show the best alignment with the users satisfaction amongst the considered metrics.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484028.2484041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

Query suggestion or auto-completion mechanisms are widely used by search engines and are increasingly attracting interest from the research community. However, the lack of commonly accepted evaluation methodology and metrics means that it is not possible to compare results and approaches from the literature. Moreover, often the metrics used to evaluate query suggestions tend to be an adaptation from other domains without a proper justification. Hence, it is not necessarily clear if the improvements reported in the literature would result in an actual improvement in the users' experience. Inspired by the cascade user models and state-of-the-art evaluation metrics in the web search domain, we address the query suggestion evaluation, by first studying the users behaviour from a search engine's query log and thereby deriving a new family of user models describing the users interaction with a query suggestion mechanism. Next, assuming a query log-based evaluation approach, we propose two new metrics to evaluate query suggestions, pSaved and eSaved. Both metrics are parameterised by a user model. pSaved is defined as the probability of using the query suggestions while submitting a query. eSaved equates to the expected relative amount of effort (keypresses) a user can avoid due to the deployed query suggestion mechanism. Finally, we experiment with both metrics using four user model instantiations as well as metrics previously used in the literature on a dataset of 6.1M sessions. Our results demonstrate that pSaved and eSaved show the best alignment with the users satisfaction amongst the considered metrics.
用于离线查询建议评估的基于用户模型的度量
查询建议或自动补全机制被搜索引擎广泛使用,并且越来越引起研究社区的兴趣。然而,缺乏普遍接受的评估方法和指标意味着不可能比较文献中的结果和方法。此外,用于评估查询建议的指标往往是从其他领域改编而来的,没有适当的理由。因此,并不一定清楚文献中报告的改进是否会导致用户体验的实际改善。受网络搜索领域的级联用户模型和最先进的评估指标的启发,我们首先从搜索引擎的查询日志中研究用户行为,从而推导出一系列新的用户模型,描述用户与查询建议机制的交互,从而解决查询建议评估问题。接下来,假设使用基于查询日志的评估方法,我们提出两个新的指标来评估查询建议:pSaved和eSaved。这两个指标都由用户模型参数化。pSaved定义为在提交查询时使用查询建议的概率。节省的时间等于由于部署了查询建议机制,用户可以避免的预期相对工作量(按键)。最后,我们使用四个用户模型实例以及文献中先前在610万个会话数据集上使用的指标对这两个指标进行了实验。我们的结果表明,pSaved和eSaved在考虑的指标中显示出与用户满意度的最佳一致性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信