Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank

Flávio Martins, João Magalhães, Jamie Callan
{"title":"Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank","authors":"Flávio Martins, João Magalhães, Jamie Callan","doi":"10.1145/2835776.2835825","DOIUrl":null,"url":null,"abstract":"In Twitter, and other microblogging services, the generation of new content by the crowd is often biased towards immediacy: what is happening now. Prompted by the propagation of commentary and information through multiple mediums, users on the Web interact with and produce new posts about newsworthy topics and give rise to trending topics. This paper proposes to leverage on the behavioral dynamics of users to estimate the most relevant time periods for a topic. Our hypothesis stems from the fact that when a real-world event occurs it usually has peak times on the Web: a higher volume of tweets, new visits and edits to related Wikipedia articles, and news published about the event. In this paper, we propose a novel time-aware ranking model that leverages on multiple sources of crowd signals. Our approach builds on two major novelties. First, a unifying approach that given query q, mines and represents temporal evidence from multiple sources of crowd signals. This allows us to predict the temporal relevance of documents for query q. Second, a principled retrieval model that integrates temporal signals in a learning to rank framework, to rank results according to the predicted temporal relevance. Evaluation on the TREC 2013 and 2014 Microblog track datasets demonstrates that the proposed model achieves a relative improvement of 13.2% over lexical retrieval models and 6.2% over a learning to rank baseline.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2835776.2835825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

In Twitter, and other microblogging services, the generation of new content by the crowd is often biased towards immediacy: what is happening now. Prompted by the propagation of commentary and information through multiple mediums, users on the Web interact with and produce new posts about newsworthy topics and give rise to trending topics. This paper proposes to leverage on the behavioral dynamics of users to estimate the most relevant time periods for a topic. Our hypothesis stems from the fact that when a real-world event occurs it usually has peak times on the Web: a higher volume of tweets, new visits and edits to related Wikipedia articles, and news published about the event. In this paper, we propose a novel time-aware ranking model that leverages on multiple sources of crowd signals. Our approach builds on two major novelties. First, a unifying approach that given query q, mines and represents temporal evidence from multiple sources of crowd signals. This allows us to predict the temporal relevance of documents for query q. Second, a principled retrieval model that integrates temporal signals in a learning to rank framework, to rank results according to the predicted temporal relevance. Evaluation on the TREC 2013 and 2014 Microblog track datasets demonstrates that the proposed model achieves a relative improvement of 13.2% over lexical retrieval models and 6.2% over a learning to rank baseline.
芭芭拉上了新闻:挖掘人群的行为,以便有时间意识地学习排名
在Twitter和其他微博服务中,大众生成的新内容往往倾向于即时性:现在正在发生的事情。在评论和信息通过多种媒介传播的推动下,网络上的用户与有新闻价值的话题互动并产生新的帖子,并产生热门话题。本文建议利用用户的行为动态来估计与主题最相关的时间段。我们的假设源于这样一个事实:当一个现实世界的事件发生时,它通常会在网络上出现高峰:tweet的数量增加,相关维基百科文章的新访问和编辑,以及关于该事件的新闻发布。在本文中,我们提出了一种新的时间感知排序模型,该模型利用了多个人群信号源。我们的方法建立在两个主要的新奇之处。首先,给出查询q的统一方法,从多个人群信号源中挖掘和表示时间证据。这使我们能够预测查询q的文档的时间相关性。其次,一个有原则的检索模型,将时间信号集成到一个学习排序框架中,根据预测的时间相关性对结果进行排序。对TREC 2013年和2014年微博轨迹数据集的评价表明,该模型比词汇检索模型和学习排名基线分别提高了13.2%和6.2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信