多语言情景下人类和神经机器翻译的自动识别

European Association for Machine Translation Conferences/Workshops Pub Date : 2023-05-31 DOI:10.48550/arXiv.2305.19757

Mălina Chichirău, Rik van Noord, Antonio Toral

{"title":"多语言情景下人类和神经机器翻译的自动识别","authors":"Mălina Chichirău, Rik van Noord, Antonio Toral","doi":"10.48550/arXiv.2305.19757","DOIUrl":null,"url":null,"abstract":"We tackle the task of automatically discriminating between human and machine translations. As opposed to most previous work, we perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models. We show that a classifier trained on parallel data with a single source language (in our case German–English) can still perform well on English translations that come from different source languages, even when the machine translations were produced by other systems than the one it was trained on. Additionally, we demonstrate that incorporating the source text in the input of a multilingual classifier improves (i) its accuracy and (ii) its robustness on cross-system evaluation, compared to a monolingual classifier. Furthermore, we find that using training data from multiple source languages (German, Russian and Chinese) tends to improve the accuracy of both monolingual and multilingual classifiers. Finally, we show that bilingual classifiers and classifiers trained on multiple source languages benefit from being trained on longer text sequences, rather than on sentences.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios\",\"authors\":\"Mălina Chichirău, Rik van Noord, Antonio Toral\",\"doi\":\"10.48550/arXiv.2305.19757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We tackle the task of automatically discriminating between human and machine translations. As opposed to most previous work, we perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models. We show that a classifier trained on parallel data with a single source language (in our case German–English) can still perform well on English translations that come from different source languages, even when the machine translations were produced by other systems than the one it was trained on. Additionally, we demonstrate that incorporating the source text in the input of a multilingual classifier improves (i) its accuracy and (ii) its robustness on cross-system evaluation, compared to a monolingual classifier. Furthermore, we find that using training data from multiple source languages (German, Russian and Chinese) tends to improve the accuracy of both monolingual and multilingual classifiers. Finally, we show that bilingual classifiers and classifiers trained on multiple source languages benefit from being trained on longer text sequences, rather than on sentences.\",\"PeriodicalId\":137211,\"journal\":{\"name\":\"European Association for Machine Translation Conferences/Workshops\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Association for Machine Translation Conferences/Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2305.19757\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Association for Machine Translation Conferences/Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2305.19757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们解决了自动区分人工翻译和机器翻译的任务。与大多数先前的工作相反，我们在多语言环境中进行实验，考虑多种语言和多语言预训练的语言模型。我们表明，在使用单一源语言(在我们的例子中是德语-英语)的并行数据上训练的分类器仍然可以在来自不同源语言的英语翻译上表现良好，即使机器翻译是由其他系统产生的，而不是它所训练的系统。此外，我们证明，与单语言分类器相比，将源文本纳入多语言分类器的输入可以提高(i)其准确性和(ii)跨系统评估的鲁棒性。此外，我们发现使用多源语言(德语、俄语和汉语)的训练数据倾向于提高单语和多语分类器的准确性。最后，我们表明双语分类器和在多源语言上训练的分类器受益于在较长的文本序列上训练，而不是在句子上训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios

We tackle the task of automatically discriminating between human and machine translations. As opposed to most previous work, we perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models. We show that a classifier trained on parallel data with a single source language (in our case German–English) can still perform well on English translations that come from different source languages, even when the machine translations were produced by other systems than the one it was trained on. Additionally, we demonstrate that incorporating the source text in the input of a multilingual classifier improves (i) its accuracy and (ii) its robustness on cross-system evaluation, compared to a monolingual classifier. Furthermore, we find that using training data from multiple source languages (German, Russian and Chinese) tends to improve the accuracy of both monolingual and multilingual classifiers. Finally, we show that bilingual classifiers and classifiers trained on multiple source languages benefit from being trained on longer text sequences, rather than on sentences.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Association for Machine Translation Conferences/Workshops

自引率

0.00%

发文量