Automatic Correction of Human Translations

North American Chapter of the Association for Computational Linguistics Pub Date : 2022-06-17 DOI:10.48550/arXiv.2206.08593

Jessy Lin, G. Kovács, Aditya Shastry, Joern Wuebker, John DeNero

{"title":"Automatic Correction of Human Translations","authors":"Jessy Lin, G. Kovács, Aditya Shastry, Joern Wuebker, John DeNero","doi":"10.48550/arXiv.2206.08593","DOIUrl":null,"url":null,"abstract":"We introduce translation error correction (TEC), the task of automatically correcting human-generated translations.Imperfections in machine translations (MT) have long motivated systems for improving translations post-hoc with automatic post-editing.In contrast, little attention has been devoted to the problem of automatically correcting human translations, despite the intuition that humans make distinct errors that machines would be well-suited to assist with, from typos to inconsistencies in translation conventions.To investigate this, we build and release the Aced corpus with three TEC datasets (available at: github.com/lilt/tec). We show that human errors in TEC exhibit a more diverse range of errors and far fewer translation fluency errors than the MT errors in automatic post-editing datasets, suggesting the need for dedicated TEC models that are specialized to correct human errors. We show that pre-training instead on synthetic errors based on human errors improves TEC F-score by as much as 5.1 points. We conducted a human-in-the-loop user study with nine professional translation editors and found that the assistance of our TEC system led them to produce significantly higher quality revised translations.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.08593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

We introduce translation error correction (TEC), the task of automatically correcting human-generated translations.Imperfections in machine translations (MT) have long motivated systems for improving translations post-hoc with automatic post-editing.In contrast, little attention has been devoted to the problem of automatically correcting human translations, despite the intuition that humans make distinct errors that machines would be well-suited to assist with, from typos to inconsistencies in translation conventions.To investigate this, we build and release the Aced corpus with three TEC datasets (available at: github.com/lilt/tec). We show that human errors in TEC exhibit a more diverse range of errors and far fewer translation fluency errors than the MT errors in automatic post-editing datasets, suggesting the need for dedicated TEC models that are specialized to correct human errors. We show that pre-training instead on synthetic errors based on human errors improves TEC F-score by as much as 5.1 points. We conducted a human-in-the-loop user study with nine professional translation editors and found that the assistance of our TEC system led them to produce significantly higher quality revised translations.

查看原文本刊更多论文

人工翻译的自动校正

我们介绍翻译错误纠正(TEC)，即自动纠正人工生成的翻译的任务。机器翻译(MT)的缺陷长期以来一直激励着通过自动后期编辑来改进翻译的系统。相比之下，很少有人关注自动纠正人类翻译的问题，尽管直觉认为人类会犯明显的错误，而机器将非常适合协助，从打字错误到翻译惯例的不一致。为了研究这个问题，我们构建并发布了带有三个TEC数据集的Aced语料库(可在:github.com/lilt/tec获得)。我们表明，与自动编辑后数据集中的翻译错误相比，TEC中的人为错误表现出更多样化的错误范围和更少的翻译流畅性错误，这表明需要专门用于纠正人为错误的专用TEC模型。我们表明，基于人为错误的合成错误的预训练将TEC f分数提高了5.1分。我们与9位专业翻译编辑进行了一项“人在循环”的用户研究，发现我们的TEC系统的帮助使他们产生了高质量的修订翻译。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

North American Chapter of the Association for Computational Linguistics

自引率

0.00%

发文量