对人类翻译的评估:BLEU vs陨石

Q2 Arts and Humanities

Lebende Sprachen Pub Date : 2020-04-01 DOI:10.1515/les-2020-0009

H. Chung

{"title":"对人类翻译的评估:BLEU vs陨石","authors":"H. Chung","doi":"10.1515/les-2020-0009","DOIUrl":null,"url":null,"abstract":"Abstract Human evaluation (HE) of translation is generally considered to be valid, but it requires a lot of effort. Automatic evaluation (AE) which assesses the quality of machine translations can be done easily, but it still requires validation. This study addresses the questions of whether and how AE can be used for human translations. For this purpose AE formulas and HE criteria were compared to each other in order to examine the validity of AE. In the empirical part of the study, 120 translations were evaluated by professional translators as well as by two representative AE-systems, BLEU/ METEOR, respectively. The correlations between AE and HE were relatively high at 0.849** (BLEU) and 0.862** (METEOR) in the overall analysis, but in the ratings of the individual texts, AE and ME exhibited a substantial difference. The AE-ME correlations were often below 0.3 or even in the negative range. Ultimately, the results indicate that neither METEOR nor BLEU can be used to assess human translation at this stage. But this paper suggests three possibilities to apply AE to compromise the weakness of HE.","PeriodicalId":35136,"journal":{"name":"Lebende Sprachen","volume":"65 1","pages":"181 - 205"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/les-2020-0009","citationCount":"5","resultStr":"{\"title\":\"Automatische Evaluation der Humanübersetzung: BLEU vs. METEOR\",\"authors\":\"H. Chung\",\"doi\":\"10.1515/les-2020-0009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Human evaluation (HE) of translation is generally considered to be valid, but it requires a lot of effort. Automatic evaluation (AE) which assesses the quality of machine translations can be done easily, but it still requires validation. This study addresses the questions of whether and how AE can be used for human translations. For this purpose AE formulas and HE criteria were compared to each other in order to examine the validity of AE. In the empirical part of the study, 120 translations were evaluated by professional translators as well as by two representative AE-systems, BLEU/ METEOR, respectively. The correlations between AE and HE were relatively high at 0.849** (BLEU) and 0.862** (METEOR) in the overall analysis, but in the ratings of the individual texts, AE and ME exhibited a substantial difference. The AE-ME correlations were often below 0.3 or even in the negative range. Ultimately, the results indicate that neither METEOR nor BLEU can be used to assess human translation at this stage. But this paper suggests three possibilities to apply AE to compromise the weakness of HE.\",\"PeriodicalId\":35136,\"journal\":{\"name\":\"Lebende Sprachen\",\"volume\":\"65 1\",\"pages\":\"181 - 205\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/les-2020-0009\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lebende Sprachen\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/les-2020-0009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lebende Sprachen","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/les-2020-0009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}

引用次数: 5

摘要

摘要翻译的人文评价(HE)被普遍认为是有效的，但需要付出大量的努力。评估机器翻译质量的自动评估(AE)很容易实现，但仍需要验证。本研究解决了AE是否以及如何用于人类翻译的问题。为此，对声发射公式和HE标准进行了比较，以检验声发射的有效性。在研究的实证部分，120篇译文分别由专业翻译人员和BLEU/ METEOR两种具有代表性的ae系统进行评估。AE与HE在整体分析中相关性较高，分别为0.849** (BLEU)和0.862** (METEOR)，但在个别文本的评分中，AE与ME表现出较大差异。AE-ME相关性往往低于0.3，甚至在负范围内。最终，结果表明METEOR和BLEU在这个阶段都不能用于评估人类翻译。但本文提出了三种利用声发射来弥补高等教育弱点的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatische Evaluation der Humanübersetzung: BLEU vs. METEOR

Abstract Human evaluation (HE) of translation is generally considered to be valid, but it requires a lot of effort. Automatic evaluation (AE) which assesses the quality of machine translations can be done easily, but it still requires validation. This study addresses the questions of whether and how AE can be used for human translations. For this purpose AE formulas and HE criteria were compared to each other in order to examine the validity of AE. In the empirical part of the study, 120 translations were evaluated by professional translators as well as by two representative AE-systems, BLEU/ METEOR, respectively. The correlations between AE and HE were relatively high at 0.849** (BLEU) and 0.862** (METEOR) in the overall analysis, but in the ratings of the individual texts, AE and ME exhibited a substantial difference. The AE-ME correlations were often below 0.3 or even in the negative range. Ultimately, the results indicate that neither METEOR nor BLEU can be used to assess human translation at this stage. But this paper suggests three possibilities to apply AE to compromise the weakness of HE.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Lebende Sprachen Arts and Humanities-Language and Linguistics

CiteScore

0.70

自引率

0.00%

发文量