The Summary Evaluation Task in the MultiLing - RANLP 2019 Workshop

George Giannakopoulos, Greece SciFY Npc Greece Ncsr Demokritos, Nikiforos Pittaras, Greece Dit Nkua Ncsr Demokritos
{"title":"The Summary Evaluation Task in the MultiLing - RANLP 2019 Workshop","authors":"George Giannakopoulos, Greece SciFY Npc Greece Ncsr Demokritos, Nikiforos Pittaras, Greece Dit Nkua Ncsr Demokritos","doi":"10.26615/978-954-452-058-8_003","DOIUrl":null,"url":null,"abstract":"This report covers the summarization evaluation task, proposed to the summarization community via the MultiLing 2019 Workshop of the RANLP 2019 conference. The task aims to encourage the development of automatic summarization evaluation methods closely aligned with manual, human-authored summary grades and judgements. A multilingual setting is adopted, building upon a corpus of Wikinews articles across 6 languages (English, Arabic, Romanian, Greek, Spanish and Czech). The evaluation utilizes human (golden) and machine-generated (peer) summaries, which have been assigned human evaluation scores from previous MultiLing tasks. Using these resources, the original corpus is augmented with synthetic data, combining summary texts under three different strategies (reorder, merge and replace), each engineered to introduce noise in the summary in a controlled and quantifiable way. We estimate that the utilization of such data can extract and highlight useful attributes of summary quality estimation, aiding the creation of data-driven automatic methods with an increased correlation to human summary evaluations across domains and languages. This paper provides a brief description of the summary evaluation task, the data generation protocol and the resources made available by the MultiLing community, towards improving automatic summarization evaluation.","PeriodicalId":182255,"journal":{"name":"Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources associated with RANLP 2019","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources associated with RANLP 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26615/978-954-452-058-8_003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This report covers the summarization evaluation task, proposed to the summarization community via the MultiLing 2019 Workshop of the RANLP 2019 conference. The task aims to encourage the development of automatic summarization evaluation methods closely aligned with manual, human-authored summary grades and judgements. A multilingual setting is adopted, building upon a corpus of Wikinews articles across 6 languages (English, Arabic, Romanian, Greek, Spanish and Czech). The evaluation utilizes human (golden) and machine-generated (peer) summaries, which have been assigned human evaluation scores from previous MultiLing tasks. Using these resources, the original corpus is augmented with synthetic data, combining summary texts under three different strategies (reorder, merge and replace), each engineered to introduce noise in the summary in a controlled and quantifiable way. We estimate that the utilization of such data can extract and highlight useful attributes of summary quality estimation, aiding the creation of data-driven automatic methods with an increased correlation to human summary evaluations across domains and languages. This paper provides a brief description of the summary evaluation task, the data generation protocol and the resources made available by the MultiLing community, towards improving automatic summarization evaluation.
MultiLing - RANLP 2019研讨会总结评估任务
本报告涵盖总结评估任务,该任务是通过RANLP 2019会议的MultiLing 2019研讨会向总结社区提出的。这项任务的目的是鼓励开发与人工撰写的总结成绩和判断密切相关的自动总结评估方法。采用多语言设置,建立在6种语言(英语、阿拉伯语、罗马尼亚语、希腊语、西班牙语和捷克语)的维基新闻文章语料库上。评估利用人类(黄金)和机器生成的(对等)摘要,这些摘要已经从以前的MultiLing任务中分配了人类评估分数。利用这些资源,将原始语料库与合成数据进行扩充,以三种不同的策略(重新排序、合并和替换)组合摘要文本,每种策略都以可控和可量化的方式在摘要中引入噪声。我们估计,利用这些数据可以提取和突出总结质量评估的有用属性,帮助创建数据驱动的自动方法,与跨领域和语言的人类总结评估增加相关性。本文简要介绍了MultiLing社区提供的摘要评估任务、数据生成协议和资源,旨在改进自动摘要评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信