Real-time Event-based News Suggestion for Wikipedia Pages from News Streams

Lijun Lyu, B. Fetahu
{"title":"Real-time Event-based News Suggestion for Wikipedia Pages from News Streams","authors":"Lijun Lyu, B. Fetahu","doi":"10.1145/3184558.3191642","DOIUrl":null,"url":null,"abstract":"Wikipedia is one of the top visited resources on the Web, furthermore, it is used extensively as the main source of information for applications like Web search, question & answering etc. This is mostly attributed to Wikipedia's coverage in terms of topics and real-world entities and the fact that Wikipedia articles are constantly updated with new and emerging facts. However, only a small fraction of articles are considered to be of good quality. The large majority of articles are incomplete and have other quality issues. A strong quality indicator is the presence of external references from third-party sources (e.g. news sources) as suggested by the verifiability principle in Wikipedia. Even for the existing references in Wikipedia there is an inherent lag in terms of the publication time of cited resources and the time they are cited in Wikipedia articles. We propose a near real-time suggestion of news references for Wikipedia from a daily news stream. We model daily news into specific events, spanning from a day up to year. Thus, we construct an event-chain from which we determine when the information in an event has converged and consequentially based on a learning-to-rank approach suggest the most authoritative and complete news article to Wikipedia articles involved in a specific event. We evaluate our news suggestion approach on a set of 41 events extracted from Wikipedia currents event portal, and on new corpus consisting of daily news between the period of 2016-2017 with more than 14 million news articles. We are able to suggest news articles to Wikipedia pages with an overall accuracy of MAP=0.77 and with a minimal lag w.r.t the publication time of the news article.","PeriodicalId":235572,"journal":{"name":"Companion Proceedings of the The Web Conference 2018","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Proceedings of the The Web Conference 2018","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3184558.3191642","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Wikipedia is one of the top visited resources on the Web, furthermore, it is used extensively as the main source of information for applications like Web search, question & answering etc. This is mostly attributed to Wikipedia's coverage in terms of topics and real-world entities and the fact that Wikipedia articles are constantly updated with new and emerging facts. However, only a small fraction of articles are considered to be of good quality. The large majority of articles are incomplete and have other quality issues. A strong quality indicator is the presence of external references from third-party sources (e.g. news sources) as suggested by the verifiability principle in Wikipedia. Even for the existing references in Wikipedia there is an inherent lag in terms of the publication time of cited resources and the time they are cited in Wikipedia articles. We propose a near real-time suggestion of news references for Wikipedia from a daily news stream. We model daily news into specific events, spanning from a day up to year. Thus, we construct an event-chain from which we determine when the information in an event has converged and consequentially based on a learning-to-rank approach suggest the most authoritative and complete news article to Wikipedia articles involved in a specific event. We evaluate our news suggestion approach on a set of 41 events extracted from Wikipedia currents event portal, and on new corpus consisting of daily news between the period of 2016-2017 with more than 14 million news articles. We are able to suggest news articles to Wikipedia pages with an overall accuracy of MAP=0.77 and with a minimal lag w.r.t the publication time of the news article.
基于新闻流的维基百科页面的实时事件新闻建议
维基百科是网络上访问量最高的资源之一,它被广泛用作网络搜索、问答等应用程序的主要信息来源。这主要归功于维基百科在主题和现实世界实体方面的覆盖,以及维基百科文章不断更新的事实。然而,只有一小部分的产品被认为是高质量的。绝大多数商品都不完整,还有其他质量问题。一个强有力的质量指标是来自第三方来源(如新闻来源)的外部参考的存在,正如维基百科中可验证性原则所建议的那样。即使是维基百科中已有的参考文献,在被引用资源的发表时间和在维基百科文章中被引用的时间上也存在着内在的滞后性。我们从每日新闻流中为维基百科提供近乎实时的新闻参考建议。我们将每日新闻建模为特定事件,从一天到一年不等。因此,我们构建了一个事件链,从中我们确定事件中的信息何时聚合,并相应地基于学习排序方法向维基百科中涉及特定事件的文章推荐最权威和最完整的新闻文章。我们从维基百科当前事件门户中提取了41个事件,并在由2016-2017年期间的每日新闻组成的新语料库上评估了我们的新闻建议方法,该语料库包含超过1400万篇新闻文章。我们能够以MAP=0.77的总体精度向Wikipedia页面推荐新闻文章,并且在新闻文章发布时间之后的延迟最小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信