对人类翻译的评估:BLEU vs陨石

Q2 Arts and Humanities
H. Chung
{"title":"对人类翻译的评估:BLEU vs陨石","authors":"H. Chung","doi":"10.1515/les-2020-0009","DOIUrl":null,"url":null,"abstract":"Abstract Human evaluation (HE) of translation is generally considered to be valid, but it requires a lot of effort. Automatic evaluation (AE) which assesses the quality of machine translations can be done easily, but it still requires validation. This study addresses the questions of whether and how AE can be used for human translations. For this purpose AE formulas and HE criteria were compared to each other in order to examine the validity of AE. In the empirical part of the study, 120 translations were evaluated by professional translators as well as by two representative AE-systems, BLEU/ METEOR, respectively. The correlations between AE and HE were relatively high at 0.849** (BLEU) and 0.862** (METEOR) in the overall analysis, but in the ratings of the individual texts, AE and ME exhibited a substantial difference. The AE-ME correlations were often below 0.3 or even in the negative range. Ultimately, the results indicate that neither METEOR nor BLEU can be used to assess human translation at this stage. But this paper suggests three possibilities to apply AE to compromise the weakness of HE.","PeriodicalId":35136,"journal":{"name":"Lebende Sprachen","volume":"65 1","pages":"181 - 205"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/les-2020-0009","citationCount":"5","resultStr":"{\"title\":\"Automatische Evaluation der Humanübersetzung: BLEU vs. METEOR\",\"authors\":\"H. Chung\",\"doi\":\"10.1515/les-2020-0009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Human evaluation (HE) of translation is generally considered to be valid, but it requires a lot of effort. Automatic evaluation (AE) which assesses the quality of machine translations can be done easily, but it still requires validation. This study addresses the questions of whether and how AE can be used for human translations. For this purpose AE formulas and HE criteria were compared to each other in order to examine the validity of AE. In the empirical part of the study, 120 translations were evaluated by professional translators as well as by two representative AE-systems, BLEU/ METEOR, respectively. The correlations between AE and HE were relatively high at 0.849** (BLEU) and 0.862** (METEOR) in the overall analysis, but in the ratings of the individual texts, AE and ME exhibited a substantial difference. The AE-ME correlations were often below 0.3 or even in the negative range. Ultimately, the results indicate that neither METEOR nor BLEU can be used to assess human translation at this stage. But this paper suggests three possibilities to apply AE to compromise the weakness of HE.\",\"PeriodicalId\":35136,\"journal\":{\"name\":\"Lebende Sprachen\",\"volume\":\"65 1\",\"pages\":\"181 - 205\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/les-2020-0009\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lebende Sprachen\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/les-2020-0009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lebende Sprachen","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/les-2020-0009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 5

摘要

摘要翻译的人文评价(HE)被普遍认为是有效的,但需要付出大量的努力。评估机器翻译质量的自动评估(AE)很容易实现,但仍需要验证。本研究解决了AE是否以及如何用于人类翻译的问题。为此,对声发射公式和HE标准进行了比较,以检验声发射的有效性。在研究的实证部分,120篇译文分别由专业翻译人员和BLEU/ METEOR两种具有代表性的ae系统进行评估。AE与HE在整体分析中相关性较高,分别为0.849** (BLEU)和0.862** (METEOR),但在个别文本的评分中,AE与ME表现出较大差异。AE-ME相关性往往低于0.3,甚至在负范围内。最终,结果表明METEOR和BLEU在这个阶段都不能用于评估人类翻译。但本文提出了三种利用声发射来弥补高等教育弱点的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automatische Evaluation der Humanübersetzung: BLEU vs. METEOR
Abstract Human evaluation (HE) of translation is generally considered to be valid, but it requires a lot of effort. Automatic evaluation (AE) which assesses the quality of machine translations can be done easily, but it still requires validation. This study addresses the questions of whether and how AE can be used for human translations. For this purpose AE formulas and HE criteria were compared to each other in order to examine the validity of AE. In the empirical part of the study, 120 translations were evaluated by professional translators as well as by two representative AE-systems, BLEU/ METEOR, respectively. The correlations between AE and HE were relatively high at 0.849** (BLEU) and 0.862** (METEOR) in the overall analysis, but in the ratings of the individual texts, AE and ME exhibited a substantial difference. The AE-ME correlations were often below 0.3 or even in the negative range. Ultimately, the results indicate that neither METEOR nor BLEU can be used to assess human translation at this stage. But this paper suggests three possibilities to apply AE to compromise the weakness of HE.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Lebende Sprachen
Lebende Sprachen Arts and Humanities-Language and Linguistics
CiteScore
0.70
自引率
0.00%
发文量
10
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信