Automatic Correction of Human Translations

Jessy Lin, G. Kovács, Aditya Shastry, Joern Wuebker, John DeNero
{"title":"Automatic Correction of Human Translations","authors":"Jessy Lin, G. Kovács, Aditya Shastry, Joern Wuebker, John DeNero","doi":"10.48550/arXiv.2206.08593","DOIUrl":null,"url":null,"abstract":"We introduce translation error correction (TEC), the task of automatically correcting human-generated translations.Imperfections in machine translations (MT) have long motivated systems for improving translations post-hoc with automatic post-editing.In contrast, little attention has been devoted to the problem of automatically correcting human translations, despite the intuition that humans make distinct errors that machines would be well-suited to assist with, from typos to inconsistencies in translation conventions.To investigate this, we build and release the Aced corpus with three TEC datasets (available at: github.com/lilt/tec). We show that human errors in TEC exhibit a more diverse range of errors and far fewer translation fluency errors than the MT errors in automatic post-editing datasets, suggesting the need for dedicated TEC models that are specialized to correct human errors. We show that pre-training instead on synthetic errors based on human errors improves TEC F-score by as much as 5.1 points. We conducted a human-in-the-loop user study with nine professional translation editors and found that the assistance of our TEC system led them to produce significantly higher quality revised translations.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.08593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

We introduce translation error correction (TEC), the task of automatically correcting human-generated translations.Imperfections in machine translations (MT) have long motivated systems for improving translations post-hoc with automatic post-editing.In contrast, little attention has been devoted to the problem of automatically correcting human translations, despite the intuition that humans make distinct errors that machines would be well-suited to assist with, from typos to inconsistencies in translation conventions.To investigate this, we build and release the Aced corpus with three TEC datasets (available at: github.com/lilt/tec). We show that human errors in TEC exhibit a more diverse range of errors and far fewer translation fluency errors than the MT errors in automatic post-editing datasets, suggesting the need for dedicated TEC models that are specialized to correct human errors. We show that pre-training instead on synthetic errors based on human errors improves TEC F-score by as much as 5.1 points. We conducted a human-in-the-loop user study with nine professional translation editors and found that the assistance of our TEC system led them to produce significantly higher quality revised translations.
人工翻译的自动校正
我们介绍翻译错误纠正(TEC),即自动纠正人工生成的翻译的任务。机器翻译(MT)的缺陷长期以来一直激励着通过自动后期编辑来改进翻译的系统。相比之下,很少有人关注自动纠正人类翻译的问题,尽管直觉认为人类会犯明显的错误,而机器将非常适合协助,从打字错误到翻译惯例的不一致。为了研究这个问题,我们构建并发布了带有三个TEC数据集的Aced语料库(可在:github.com/lilt/tec获得)。我们表明,与自动编辑后数据集中的翻译错误相比,TEC中的人为错误表现出更多样化的错误范围和更少的翻译流畅性错误,这表明需要专门用于纠正人为错误的专用TEC模型。我们表明,基于人为错误的合成错误的预训练将TEC f分数提高了5.1分。我们与9位专业翻译编辑进行了一项“人在循环”的用户研究,发现我们的TEC系统的帮助使他们产生了高质量的修订翻译。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信