通过基于查询的hawkes流程识别和标记搜索任务

Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, H. Zha
{"title":"通过基于查询的hawkes流程识别和标记搜索任务","authors":"Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, H. Zha","doi":"10.1145/2623330.2623679","DOIUrl":null,"url":null,"abstract":"We consider a search task as a set of queries that serve the same user information need. Analyzing search tasks from user query streams plays an important role in building a set of modern tools to improve search engine performance. In this paper, we propose a probabilistic method for identifying and labeling search tasks based on the following intuitive observations: queries that are issued temporally close by users in many sequences of queries are likely to belong to the same search task, meanwhile, different users having the same information needs tend to submit topically coherent search queries. To capture the above intuitions, we directly model query temporal patterns using a special class of point processes called Hawkes processes, and combine topic models with Hawkes processes for simultaneously identifying and labeling search tasks. Essentially, Hawkes processes utilize their self-exciting properties to identify search tasks if influence exists among a sequence of queries for individual users, while the topic model exploits query co-occurrence across different users to discover the latent information needed for labeling search tasks. More importantly, there is mutual reinforcement between Hawkes processes and the topic model in the unified model that enhances the performance of both. We evaluate our method based on both synthetic data and real-world query log data. In addition, we also apply our model to query clustering and search task identification. By comparing with state-of-the-art methods, the results demonstrate that the improvement in our proposed approach is consistent and promising.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"60","resultStr":"{\"title\":\"Identifying and labeling search tasks via query-based hawkes processes\",\"authors\":\"Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, H. Zha\",\"doi\":\"10.1145/2623330.2623679\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a search task as a set of queries that serve the same user information need. Analyzing search tasks from user query streams plays an important role in building a set of modern tools to improve search engine performance. In this paper, we propose a probabilistic method for identifying and labeling search tasks based on the following intuitive observations: queries that are issued temporally close by users in many sequences of queries are likely to belong to the same search task, meanwhile, different users having the same information needs tend to submit topically coherent search queries. To capture the above intuitions, we directly model query temporal patterns using a special class of point processes called Hawkes processes, and combine topic models with Hawkes processes for simultaneously identifying and labeling search tasks. Essentially, Hawkes processes utilize their self-exciting properties to identify search tasks if influence exists among a sequence of queries for individual users, while the topic model exploits query co-occurrence across different users to discover the latent information needed for labeling search tasks. More importantly, there is mutual reinforcement between Hawkes processes and the topic model in the unified model that enhances the performance of both. We evaluate our method based on both synthetic data and real-world query log data. In addition, we also apply our model to query clustering and search task identification. By comparing with state-of-the-art methods, the results demonstrate that the improvement in our proposed approach is consistent and promising.\",\"PeriodicalId\":20536,\"journal\":{\"name\":\"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"60\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2623330.2623679\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2623330.2623679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 60

摘要

我们将搜索任务视为满足相同用户信息需求的一组查询。从用户查询流中分析搜索任务对于构建一套提高搜索引擎性能的现代工具起着重要的作用。本文基于以下直观观察,提出了一种识别和标记搜索任务的概率方法:在许多查询序列中,用户发出的查询在时间上接近于同一搜索任务,同时,具有相同信息需求的不同用户倾向于提交主题一致的搜索查询。为了捕捉上述直觉,我们使用一种称为Hawkes过程的特殊点过程直接对查询时间模式建模,并将主题模型与Hawkes过程结合起来,同时识别和标记搜索任务。从本质上讲,Hawkes过程利用其自兴奋特性来识别搜索任务,如果单个用户的查询序列之间存在影响,而主题模型利用不同用户之间的查询共发生来发现标记搜索任务所需的潜在信息。更重要的是,在统一模型中Hawkes过程与主题模型之间存在着相互强化的关系,增强了两者的性能。我们基于合成数据和实际查询日志数据来评估我们的方法。此外,我们还将该模型应用于查询聚类和搜索任务识别。通过与最先进的方法进行比较,结果表明我们提出的方法的改进是一致的和有希望的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Identifying and labeling search tasks via query-based hawkes processes
We consider a search task as a set of queries that serve the same user information need. Analyzing search tasks from user query streams plays an important role in building a set of modern tools to improve search engine performance. In this paper, we propose a probabilistic method for identifying and labeling search tasks based on the following intuitive observations: queries that are issued temporally close by users in many sequences of queries are likely to belong to the same search task, meanwhile, different users having the same information needs tend to submit topically coherent search queries. To capture the above intuitions, we directly model query temporal patterns using a special class of point processes called Hawkes processes, and combine topic models with Hawkes processes for simultaneously identifying and labeling search tasks. Essentially, Hawkes processes utilize their self-exciting properties to identify search tasks if influence exists among a sequence of queries for individual users, while the topic model exploits query co-occurrence across different users to discover the latent information needed for labeling search tasks. More importantly, there is mutual reinforcement between Hawkes processes and the topic model in the unified model that enhances the performance of both. We evaluate our method based on both synthetic data and real-world query log data. In addition, we also apply our model to query clustering and search task identification. By comparing with state-of-the-art methods, the results demonstrate that the improvement in our proposed approach is consistent and promising.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信