{"title":"Real-time Event-based News Suggestion for Wikipedia Pages from News Streams","authors":"Lijun Lyu, B. Fetahu","doi":"10.1145/3184558.3191642","DOIUrl":null,"url":null,"abstract":"Wikipedia is one of the top visited resources on the Web, furthermore, it is used extensively as the main source of information for applications like Web search, question & answering etc. This is mostly attributed to Wikipedia's coverage in terms of topics and real-world entities and the fact that Wikipedia articles are constantly updated with new and emerging facts. However, only a small fraction of articles are considered to be of good quality. The large majority of articles are incomplete and have other quality issues. A strong quality indicator is the presence of external references from third-party sources (e.g. news sources) as suggested by the verifiability principle in Wikipedia. Even for the existing references in Wikipedia there is an inherent lag in terms of the publication time of cited resources and the time they are cited in Wikipedia articles. We propose a near real-time suggestion of news references for Wikipedia from a daily news stream. We model daily news into specific events, spanning from a day up to year. Thus, we construct an event-chain from which we determine when the information in an event has converged and consequentially based on a learning-to-rank approach suggest the most authoritative and complete news article to Wikipedia articles involved in a specific event. We evaluate our news suggestion approach on a set of 41 events extracted from Wikipedia currents event portal, and on new corpus consisting of daily news between the period of 2016-2017 with more than 14 million news articles. We are able to suggest news articles to Wikipedia pages with an overall accuracy of MAP=0.77 and with a minimal lag w.r.t the publication time of the news article.","PeriodicalId":235572,"journal":{"name":"Companion Proceedings of the The Web Conference 2018","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Proceedings of the The Web Conference 2018","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3184558.3191642","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Wikipedia is one of the top visited resources on the Web, furthermore, it is used extensively as the main source of information for applications like Web search, question & answering etc. This is mostly attributed to Wikipedia's coverage in terms of topics and real-world entities and the fact that Wikipedia articles are constantly updated with new and emerging facts. However, only a small fraction of articles are considered to be of good quality. The large majority of articles are incomplete and have other quality issues. A strong quality indicator is the presence of external references from third-party sources (e.g. news sources) as suggested by the verifiability principle in Wikipedia. Even for the existing references in Wikipedia there is an inherent lag in terms of the publication time of cited resources and the time they are cited in Wikipedia articles. We propose a near real-time suggestion of news references for Wikipedia from a daily news stream. We model daily news into specific events, spanning from a day up to year. Thus, we construct an event-chain from which we determine when the information in an event has converged and consequentially based on a learning-to-rank approach suggest the most authoritative and complete news article to Wikipedia articles involved in a specific event. We evaluate our news suggestion approach on a set of 41 events extracted from Wikipedia currents event portal, and on new corpus consisting of daily news between the period of 2016-2017 with more than 14 million news articles. We are able to suggest news articles to Wikipedia pages with an overall accuracy of MAP=0.77 and with a minimal lag w.r.t the publication time of the news article.