Translation Performance from the User’s Perspective of Large Language Models and Neural Machine Translation Systems

IF 2.9 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information (Switzerland) Pub Date : 2023-10-19 DOI:10.3390/info14100574

Jungha Son, Boyoung Kim

{"title":"Translation Performance from the User’s Perspective of Large Language Models and Neural Machine Translation Systems","authors":"Jungha Son, Boyoung Kim","doi":"10.3390/info14100574","DOIUrl":null,"url":null,"abstract":"The rapid global expansion of ChatGPT, which plays a crucial role in interactive knowledge sharing and translation, underscores the importance of comparative performance assessments in artificial intelligence (AI) technology. This study concentrated on this crucial issue by exploring and contrasting the translation performances of large language models (LLMs) and neural machine translation (NMT) systems. For this aim, the APIs of Google Translate, Microsoft Translator, and OpenAI’s ChatGPT were utilized, leveraging parallel corpora from the Workshop on Machine Translation (WMT) 2018 and 2020 benchmarks. By applying recognized evaluation metrics such as BLEU, chrF, and TER, a comprehensive performance analysis across a variety of language pairs, translation directions, and reference token sizes was conducted. The findings reveal that while Google Translate and Microsoft Translator generally surpass ChatGPT in terms of their BLEU, chrF, and TER scores, ChatGPT exhibits superior performance in specific language pairs. Translations from non-English to English consistently yielded better results across all three systems compared with translations from English to non-English. Significantly, an improvement in translation system performance was observed as the token size increased, hinting at the potential benefits of training models on larger token sizes.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"30 1","pages":"0"},"PeriodicalIF":2.9000,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information (Switzerland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/info14100574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid global expansion of ChatGPT, which plays a crucial role in interactive knowledge sharing and translation, underscores the importance of comparative performance assessments in artificial intelligence (AI) technology. This study concentrated on this crucial issue by exploring and contrasting the translation performances of large language models (LLMs) and neural machine translation (NMT) systems. For this aim, the APIs of Google Translate, Microsoft Translator, and OpenAI’s ChatGPT were utilized, leveraging parallel corpora from the Workshop on Machine Translation (WMT) 2018 and 2020 benchmarks. By applying recognized evaluation metrics such as BLEU, chrF, and TER, a comprehensive performance analysis across a variety of language pairs, translation directions, and reference token sizes was conducted. The findings reveal that while Google Translate and Microsoft Translator generally surpass ChatGPT in terms of their BLEU, chrF, and TER scores, ChatGPT exhibits superior performance in specific language pairs. Translations from non-English to English consistently yielded better results across all three systems compared with translations from English to non-English. Significantly, an improvement in translation system performance was observed as the token size increased, hinting at the potential benefits of training models on larger token sizes.

查看原文本刊更多论文

基于用户视角的大语言模型和神经机器翻译系统的翻译性能

ChatGPT在交互式知识共享和翻译中发挥着至关重要的作用，它在全球的迅速扩张凸显了人工智能(AI)技术中比较绩效评估的重要性。本研究通过探索和对比大型语言模型(llm)和神经机器翻译(NMT)系统的翻译性能来关注这一关键问题。为此，我们利用了谷歌翻译、微软翻译和OpenAI的ChatGPT的api，并利用了2018年和2020年机器翻译研讨会(WMT)基准的平行语料库。通过应用BLEU、chrF和TER等公认的评估指标，对各种语言对、翻译方向和参考标记大小进行了全面的性能分析。研究结果显示，虽然谷歌翻译和微软翻译在BLEU、chrF和TER得分方面普遍超过ChatGPT，但ChatGPT在特定语言对上表现优异。与从英语到非英语的翻译相比，从非英语到英语的翻译在所有三个系统中始终产生更好的结果。值得注意的是，随着令牌大小的增加，可以观察到翻译系统性能的改善，这暗示了在更大的令牌大小上训练模型的潜在好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊