On English-Chinese Neural Machine Translation leveraging Transformer model

Subrota Kumar Mondal , Yijun Chen , Yuning Cheng , Hong-Ning Dai , Syed B. Alam , H.M. Dipu Kabir
{"title":"On English-Chinese Neural Machine Translation leveraging Transformer model","authors":"Subrota Kumar Mondal ,&nbsp;Yijun Chen ,&nbsp;Yuning Cheng ,&nbsp;Hong-Ning Dai ,&nbsp;Syed B. Alam ,&nbsp;H.M. Dipu Kabir","doi":"10.1016/j.nlp.2025.100166","DOIUrl":null,"url":null,"abstract":"<div><div>In today’s era of globalization, people’s cross-cultural communication has become increasingly frequent, and photo translation (photo, image, or scene text translation) technology has become an important tool. By using this translation technology, people can easily recognize and translate text from other languages without the need for manual input or translation. This has important practical value for people in fields such as tourism, business, education, and research. Therefore, photo translation technology has become an indispensable tool, providing more convenience to people’s lives and work. To this, this paper aims to achieve high accuracy English to Chinese photo translation, which can be divided into three stages: <span>text detection</span>, <span>text recognition</span>, and <span>text translation (i.e., machine translation)</span>. We observe that in text detection and recognition, we have challenges with occluded text, hand-written text, scene text, text with complex layout, distorted text, and many others. However, in this paper, we limit our analysis to Translation phase. For detection and recognition phase, we make use of current state-of-the-art methodologies, such as <span>DBNet</span> (Liao et al., 2020) model for detection and the <span>ABINet</span> (Fang et al., 2021) model for recognition. In the translation part, we use Transformer model with modifications towards improving the translation accuracy. The modifications are mainly reflected in two aspects: <span>data preprocessing</span> and <span>optimizer</span>. In the data preprocessing part, we use the <span>BPE</span> (Byte Pair Encoding) algorithm instead of basic word-centered tokenization algorithms. In the context, <span>BPE</span> algorithm can divide words into smaller subwords, which can solve the problem of rare words to some extent and provide better word vectors for language model training. In the optimizer part, we use the <span>Lion</span> model proposed by Google instead of the widely used <span>Adam</span> optimizer that helps reduce the loss more quickly than using Adam optimizer for small size batch — with batch size 256 achieves the lowest test loss 0.392842 (−1.072171) and the highest BLEU4 score 0.381281 (+0.24063). This adds value in reducing the consumption of training resources and the sustainability of deep learning.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100166"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719125000421","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In today’s era of globalization, people’s cross-cultural communication has become increasingly frequent, and photo translation (photo, image, or scene text translation) technology has become an important tool. By using this translation technology, people can easily recognize and translate text from other languages without the need for manual input or translation. This has important practical value for people in fields such as tourism, business, education, and research. Therefore, photo translation technology has become an indispensable tool, providing more convenience to people’s lives and work. To this, this paper aims to achieve high accuracy English to Chinese photo translation, which can be divided into three stages: text detection, text recognition, and text translation (i.e., machine translation). We observe that in text detection and recognition, we have challenges with occluded text, hand-written text, scene text, text with complex layout, distorted text, and many others. However, in this paper, we limit our analysis to Translation phase. For detection and recognition phase, we make use of current state-of-the-art methodologies, such as DBNet (Liao et al., 2020) model for detection and the ABINet (Fang et al., 2021) model for recognition. In the translation part, we use Transformer model with modifications towards improving the translation accuracy. The modifications are mainly reflected in two aspects: data preprocessing and optimizer. In the data preprocessing part, we use the BPE (Byte Pair Encoding) algorithm instead of basic word-centered tokenization algorithms. In the context, BPE algorithm can divide words into smaller subwords, which can solve the problem of rare words to some extent and provide better word vectors for language model training. In the optimizer part, we use the Lion model proposed by Google instead of the widely used Adam optimizer that helps reduce the loss more quickly than using Adam optimizer for small size batch — with batch size 256 achieves the lowest test loss 0.392842 (−1.072171) and the highest BLEU4 score 0.381281 (+0.24063). This adds value in reducing the consumption of training resources and the sustainability of deep learning.
基于Transformer模型的英汉神经机器翻译研究
在当今全球化时代,人们的跨文化交流日益频繁,照片翻译(照片、图像或场景文本翻译)技术成为重要的工具。通过使用这种翻译技术,人们可以轻松地识别和翻译来自其他语言的文本,而无需人工输入或翻译。这对旅游、商业、教育和研究等领域的人们具有重要的实用价值。因此,照片翻译技术已经成为一种不可或缺的工具,为人们的生活和工作提供了更多的便利。为此,本文的目标是实现高精度的英汉照片翻译,该过程分为三个阶段:文本检测、文本识别和文本翻译(即机器翻译)。我们观察到,在文本检测和识别中,我们面临着遮挡文本、手写文本、场景文本、复杂布局文本、扭曲文本等诸多挑战。然而,在本文中,我们的分析仅限于翻译阶段。对于检测和识别阶段,我们使用当前最先进的方法,例如用于检测的DBNet (Liao等人,2020)模型和用于识别的ABINet (Fang等人,2021)模型。在翻译部分,为了提高翻译精度,我们使用了经过修改的Transformer模型。修改主要体现在数据预处理和优化器两个方面。在数据预处理部分,我们使用BPE(字节对编码)算法代替基本的以单词为中心的标记化算法。在这种情况下,BPE算法可以将单词分成更小的子词,在一定程度上解决了单词罕见的问题,为语言模型训练提供了更好的单词向量。在优化器部分,我们使用谷歌提出的Lion模型,而不是广泛使用的Adam优化器,它有助于比使用Adam优化器更快地减少小批量的损失-批量大小256达到最低的测试损失0.392842(−1.072171)和最高的BLEU4分数0.381281(+0.24063)。这在减少培训资源的消耗和深度学习的可持续性方面增加了价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信