{"title":"估计新闻文章摘录的时间模型","authors":"Arunav Mishra, K. Berberich","doi":"10.1145/2983323.2983802","DOIUrl":null,"url":null,"abstract":"It is often difficult to ground text to precise time intervals due to the inherent uncertainty arising from either missing or multiple expressions at year, month, and day time granularities. We address the problem of estimating an excerpt-time model capturing the temporal scope of a given news article excerpt as a probability distribution over chronons. For this, we propose a semi-supervised distribution propagation framework that leverages redundancy in the data to improve the quality of estimated time models. Our method generates an event graph with excerpts as nodes and models various inter-excerpt relations as edges. It then propagates empirical excerpt-time models estimated for temporally annotated excerpts, to those that are strongly related but miss annotations. In our experiments, we first generate a test query set by randomly sampling 100 Wikipedia events as queries. For each query, making use of a standard text retrieval model, we then obtain top-10 documents with an average of 150 excerpts. From these, each temporally annotated excerpt is considered as gold standard. The evaluation measures are first computed for each gold standard excerpt for a single query, by comparing the estimated model with our method to the empirical model from the original expressions. Final scores are reported by averaging over all the test queries. Experiments on the English Gigaword corpus show that our method estimates significantly better time models than several baselines taken from the literature.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Estimating Time Models for News Article Excerpts\",\"authors\":\"Arunav Mishra, K. Berberich\",\"doi\":\"10.1145/2983323.2983802\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is often difficult to ground text to precise time intervals due to the inherent uncertainty arising from either missing or multiple expressions at year, month, and day time granularities. We address the problem of estimating an excerpt-time model capturing the temporal scope of a given news article excerpt as a probability distribution over chronons. For this, we propose a semi-supervised distribution propagation framework that leverages redundancy in the data to improve the quality of estimated time models. Our method generates an event graph with excerpts as nodes and models various inter-excerpt relations as edges. It then propagates empirical excerpt-time models estimated for temporally annotated excerpts, to those that are strongly related but miss annotations. In our experiments, we first generate a test query set by randomly sampling 100 Wikipedia events as queries. For each query, making use of a standard text retrieval model, we then obtain top-10 documents with an average of 150 excerpts. From these, each temporally annotated excerpt is considered as gold standard. The evaluation measures are first computed for each gold standard excerpt for a single query, by comparing the estimated model with our method to the empirical model from the original expressions. Final scores are reported by averaging over all the test queries. Experiments on the English Gigaword corpus show that our method estimates significantly better time models than several baselines taken from the literature.\",\"PeriodicalId\":250808,\"journal\":{\"name\":\"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2983323.2983802\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983323.2983802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
It is often difficult to ground text to precise time intervals due to the inherent uncertainty arising from either missing or multiple expressions at year, month, and day time granularities. We address the problem of estimating an excerpt-time model capturing the temporal scope of a given news article excerpt as a probability distribution over chronons. For this, we propose a semi-supervised distribution propagation framework that leverages redundancy in the data to improve the quality of estimated time models. Our method generates an event graph with excerpts as nodes and models various inter-excerpt relations as edges. It then propagates empirical excerpt-time models estimated for temporally annotated excerpts, to those that are strongly related but miss annotations. In our experiments, we first generate a test query set by randomly sampling 100 Wikipedia events as queries. For each query, making use of a standard text retrieval model, we then obtain top-10 documents with an average of 150 excerpts. From these, each temporally annotated excerpt is considered as gold standard. The evaluation measures are first computed for each gold standard excerpt for a single query, by comparing the estimated model with our method to the empirical model from the original expressions. Final scores are reported by averaging over all the test queries. Experiments on the English Gigaword corpus show that our method estimates significantly better time models than several baselines taken from the literature.