Estimating Time Models for News Article Excerpts

Arunav Mishra, K. Berberich
{"title":"Estimating Time Models for News Article Excerpts","authors":"Arunav Mishra, K. Berberich","doi":"10.1145/2983323.2983802","DOIUrl":null,"url":null,"abstract":"It is often difficult to ground text to precise time intervals due to the inherent uncertainty arising from either missing or multiple expressions at year, month, and day time granularities. We address the problem of estimating an excerpt-time model capturing the temporal scope of a given news article excerpt as a probability distribution over chronons. For this, we propose a semi-supervised distribution propagation framework that leverages redundancy in the data to improve the quality of estimated time models. Our method generates an event graph with excerpts as nodes and models various inter-excerpt relations as edges. It then propagates empirical excerpt-time models estimated for temporally annotated excerpts, to those that are strongly related but miss annotations. In our experiments, we first generate a test query set by randomly sampling 100 Wikipedia events as queries. For each query, making use of a standard text retrieval model, we then obtain top-10 documents with an average of 150 excerpts. From these, each temporally annotated excerpt is considered as gold standard. The evaluation measures are first computed for each gold standard excerpt for a single query, by comparing the estimated model with our method to the empirical model from the original expressions. Final scores are reported by averaging over all the test queries. Experiments on the English Gigaword corpus show that our method estimates significantly better time models than several baselines taken from the literature.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983323.2983802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

It is often difficult to ground text to precise time intervals due to the inherent uncertainty arising from either missing or multiple expressions at year, month, and day time granularities. We address the problem of estimating an excerpt-time model capturing the temporal scope of a given news article excerpt as a probability distribution over chronons. For this, we propose a semi-supervised distribution propagation framework that leverages redundancy in the data to improve the quality of estimated time models. Our method generates an event graph with excerpts as nodes and models various inter-excerpt relations as edges. It then propagates empirical excerpt-time models estimated for temporally annotated excerpts, to those that are strongly related but miss annotations. In our experiments, we first generate a test query set by randomly sampling 100 Wikipedia events as queries. For each query, making use of a standard text retrieval model, we then obtain top-10 documents with an average of 150 excerpts. From these, each temporally annotated excerpt is considered as gold standard. The evaluation measures are first computed for each gold standard excerpt for a single query, by comparing the estimated model with our method to the empirical model from the original expressions. Final scores are reported by averaging over all the test queries. Experiments on the English Gigaword corpus show that our method estimates significantly better time models than several baselines taken from the literature.
估计新闻文章摘录的时间模型
由于在年、月和日时间粒度上缺少或多个表达式所产生的固有不确定性,通常很难将文本与精确的时间间隔联系起来。我们解决了估计摘录时间模型的问题,该模型捕获给定新闻文章摘录的时间范围作为时间轴上的概率分布。为此,我们提出了一种半监督分布传播框架,利用数据中的冗余来提高估计时间模型的质量。我们的方法生成一个以摘录为节点的事件图,并将各种摘录之间的关系建模为边。然后,它将为时间注释的摘录估计的经验摘录时间模型传播到那些强烈相关但没有注释的摘录。在我们的实验中,我们首先通过随机抽取100个维基百科事件作为查询来生成一个测试查询集。对于每个查询,使用标准文本检索模型,然后我们获得平均150个摘录的前10个文档。从这些中,每一个临时注释的摘录都被认为是黄金标准。首先对单个查询的每个金标准摘录计算评估度量,通过将我们的方法估计的模型与原始表达式的经验模型进行比较。最终分数是通过对所有测试查询的平均值来报告的。在英语Gigaword语料库上的实验表明,我们的方法估计的时间模型比从文献中获取的几个基线要好得多。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信