Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI:10.1145/2835776.2835825

Flávio Martins, João Magalhães, Jamie Callan

{"title":"Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank","authors":"Flávio Martins, João Magalhães, Jamie Callan","doi":"10.1145/2835776.2835825","DOIUrl":null,"url":null,"abstract":"In Twitter, and other microblogging services, the generation of new content by the crowd is often biased towards immediacy: what is happening now. Prompted by the propagation of commentary and information through multiple mediums, users on the Web interact with and produce new posts about newsworthy topics and give rise to trending topics. This paper proposes to leverage on the behavioral dynamics of users to estimate the most relevant time periods for a topic. Our hypothesis stems from the fact that when a real-world event occurs it usually has peak times on the Web: a higher volume of tweets, new visits and edits to related Wikipedia articles, and news published about the event. In this paper, we propose a novel time-aware ranking model that leverages on multiple sources of crowd signals. Our approach builds on two major novelties. First, a unifying approach that given query q, mines and represents temporal evidence from multiple sources of crowd signals. This allows us to predict the temporal relevance of documents for query q. Second, a principled retrieval model that integrates temporal signals in a learning to rank framework, to rank results according to the predicted temporal relevance. Evaluation on the TREC 2013 and 2014 Microblog track datasets demonstrates that the proposed model achieves a relative improvement of 13.2% over lexical retrieval models and 6.2% over a learning to rank baseline.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2835776.2835825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

In Twitter, and other microblogging services, the generation of new content by the crowd is often biased towards immediacy: what is happening now. Prompted by the propagation of commentary and information through multiple mediums, users on the Web interact with and produce new posts about newsworthy topics and give rise to trending topics. This paper proposes to leverage on the behavioral dynamics of users to estimate the most relevant time periods for a topic. Our hypothesis stems from the fact that when a real-world event occurs it usually has peak times on the Web: a higher volume of tweets, new visits and edits to related Wikipedia articles, and news published about the event. In this paper, we propose a novel time-aware ranking model that leverages on multiple sources of crowd signals. Our approach builds on two major novelties. First, a unifying approach that given query q, mines and represents temporal evidence from multiple sources of crowd signals. This allows us to predict the temporal relevance of documents for query q. Second, a principled retrieval model that integrates temporal signals in a learning to rank framework, to rank results according to the predicted temporal relevance. Evaluation on the TREC 2013 and 2014 Microblog track datasets demonstrates that the proposed model achieves a relative improvement of 13.2% over lexical retrieval models and 6.2% over a learning to rank baseline.

查看原文本刊更多论文

芭芭拉上了新闻:挖掘人群的行为，以便有时间意识地学习排名

在Twitter和其他微博服务中，大众生成的新内容往往倾向于即时性:现在正在发生的事情。在评论和信息通过多种媒介传播的推动下，网络上的用户与有新闻价值的话题互动并产生新的帖子，并产生热门话题。本文建议利用用户的行为动态来估计与主题最相关的时间段。我们的假设源于这样一个事实:当一个现实世界的事件发生时，它通常会在网络上出现高峰:tweet的数量增加，相关维基百科文章的新访问和编辑，以及关于该事件的新闻发布。在本文中，我们提出了一种新的时间感知排序模型，该模型利用了多个人群信号源。我们的方法建立在两个主要的新奇之处。首先，给出查询q的统一方法，从多个人群信号源中挖掘和表示时间证据。这使我们能够预测查询q的文档的时间相关性。其次，一个有原则的检索模型，将时间信号集成到一个学习排序框架中，根据预测的时间相关性对结果进行排序。对TREC 2013年和2014年微博轨迹数据集的评价表明，该模型比词汇检索模型和学习排名基线分别提高了13.2%和6.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量