基于语义相似度的新闻文摘评价

Figen Beken Fikri, Kemal Oflazer, B. Yanikoglu
{"title":"基于语义相似度的新闻文摘评价","authors":"Figen Beken Fikri, Kemal Oflazer, B. Yanikoglu","doi":"10.18653/v1/2021.gem-1.3","DOIUrl":null,"url":null,"abstract":"ROUGE is a widely used evaluation metric in text summarization. However, it is not suitable for the evaluation of abstractive summarization systems as it relies on lexical overlap between the gold standard and the generated summaries. This limitation becomes more apparent for agglutinative languages with very large vocabularies and high type/token ratios. In this paper, we present semantic similarity models for Turkish and apply them as evaluation metrics for an abstractive summarization task. To achieve this, we translated the English STSb dataset into Turkish and presented the first semantic textual similarity dataset for Turkish as well. We showed that our best similarity models have better alignment with average human judgments compared to ROUGE in both Pearson and Spearman correlations.","PeriodicalId":431658,"journal":{"name":"Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Semantic Similarity Based Evaluation for Abstractive News Summarization\",\"authors\":\"Figen Beken Fikri, Kemal Oflazer, B. Yanikoglu\",\"doi\":\"10.18653/v1/2021.gem-1.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ROUGE is a widely used evaluation metric in text summarization. However, it is not suitable for the evaluation of abstractive summarization systems as it relies on lexical overlap between the gold standard and the generated summaries. This limitation becomes more apparent for agglutinative languages with very large vocabularies and high type/token ratios. In this paper, we present semantic similarity models for Turkish and apply them as evaluation metrics for an abstractive summarization task. To achieve this, we translated the English STSb dataset into Turkish and presented the first semantic textual similarity dataset for Turkish as well. We showed that our best similarity models have better alignment with average human judgments compared to ROUGE in both Pearson and Spearman correlations.\",\"PeriodicalId\":431658,\"journal\":{\"name\":\"Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2021.gem-1.3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.gem-1.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

ROUGE是一种广泛应用于文本摘要的评价度量。然而,它不适合评估抽象摘要系统,因为它依赖于金标准和生成的摘要之间的词汇重叠。对于具有非常大的词汇表和高类型/标记比率的粘合语言,这种限制变得更加明显。在本文中,我们提出了土耳其语的语义相似度模型,并将其作为抽象摘要任务的评估指标。为了实现这一目标,我们将英文STSb数据集翻译成土耳其语,并提出了土耳其语的第一个语义文本相似度数据集。我们发现,在Pearson和Spearman相关性中,与ROUGE相比,我们的最佳相似性模型与人类平均判断有更好的一致性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Semantic Similarity Based Evaluation for Abstractive News Summarization
ROUGE is a widely used evaluation metric in text summarization. However, it is not suitable for the evaluation of abstractive summarization systems as it relies on lexical overlap between the gold standard and the generated summaries. This limitation becomes more apparent for agglutinative languages with very large vocabularies and high type/token ratios. In this paper, we present semantic similarity models for Turkish and apply them as evaluation metrics for an abstractive summarization task. To achieve this, we translated the English STSb dataset into Turkish and presented the first semantic textual similarity dataset for Turkish as well. We showed that our best similarity models have better alignment with average human judgments compared to ROUGE in both Pearson and Spearman correlations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信