基于神经注意的英语到孟加拉语翻译编码器-解码器方法

Abdullah Al Shiam, S. M. Redwan, Humaun Kabir, Jungpil Shin
{"title":"基于神经注意的英语到孟加拉语翻译编码器-解码器方法","authors":"Abdullah Al Shiam, S. M. Redwan, Humaun Kabir, Jungpil Shin","doi":"10.56415/csjm.v31.04","DOIUrl":null,"url":null,"abstract":"Machine translation (MT) is the process of translating text from one language to another using bilingual data sets and grammatical rules. Recent works in the field of MT have popularized sequence-to-sequence models leveraging neural attention and deep learning. The success of neural attention models is yet to be construed into a robust framework for automated English-to-Bangla translation due to a lack of a comprehensive dataset that encompasses the diverse vocabulary of the Bangla language. In this study, we have proposed an English-to-Bangla MT system using an encoder-decoder attention model using the CCMatrix corpus. Our method shows that this model can outperform traditional SMT and RBMT models with a Bilingual Evaluation Understudy (BLEU) score of 15.68 despite being constrained by the limited vocabulary of the corpus. We hypothesize that this model can be used successfully for state-of-the-art machine translation with a more diverse and accurate dataset. This work can be extended further to incorporate several newer datasets using transfer learning techniques.\n","PeriodicalId":262087,"journal":{"name":"Comput. Sci. J. Moldova","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Neural Attention-Based Encoder-Decoder Approach for English to Bangla Translation\",\"authors\":\"Abdullah Al Shiam, S. M. Redwan, Humaun Kabir, Jungpil Shin\",\"doi\":\"10.56415/csjm.v31.04\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine translation (MT) is the process of translating text from one language to another using bilingual data sets and grammatical rules. Recent works in the field of MT have popularized sequence-to-sequence models leveraging neural attention and deep learning. The success of neural attention models is yet to be construed into a robust framework for automated English-to-Bangla translation due to a lack of a comprehensive dataset that encompasses the diverse vocabulary of the Bangla language. In this study, we have proposed an English-to-Bangla MT system using an encoder-decoder attention model using the CCMatrix corpus. Our method shows that this model can outperform traditional SMT and RBMT models with a Bilingual Evaluation Understudy (BLEU) score of 15.68 despite being constrained by the limited vocabulary of the corpus. We hypothesize that this model can be used successfully for state-of-the-art machine translation with a more diverse and accurate dataset. This work can be extended further to incorporate several newer datasets using transfer learning techniques.\\n\",\"PeriodicalId\":262087,\"journal\":{\"name\":\"Comput. Sci. J. Moldova\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Comput. Sci. J. Moldova\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.56415/csjm.v31.04\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comput. Sci. J. Moldova","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56415/csjm.v31.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

机器翻译是使用双语数据集和语法规则将文本从一种语言翻译成另一种语言的过程。最近在机器翻译领域的工作推广了利用神经注意和深度学习的序列到序列模型。由于缺乏包含孟加拉语多种词汇的综合数据集,神经注意模型的成功尚未被解释为一个强大的自动英语到孟加拉语翻译框架。在这项研究中,我们提出了一个使用CCMatrix语料库的编码器-解码器注意模型的英语-孟加拉语机器翻译系统。我们的方法表明,尽管受到语料库有限词汇量的限制,该模型仍能以15.68分的双语评价替补(BLEU)分数优于传统的SMT和RBMT模型。我们假设该模型可以成功地用于具有更多样化和更准确数据集的最先进的机器翻译。这项工作可以进一步扩展到使用迁移学习技术合并几个更新的数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Neural Attention-Based Encoder-Decoder Approach for English to Bangla Translation
Machine translation (MT) is the process of translating text from one language to another using bilingual data sets and grammatical rules. Recent works in the field of MT have popularized sequence-to-sequence models leveraging neural attention and deep learning. The success of neural attention models is yet to be construed into a robust framework for automated English-to-Bangla translation due to a lack of a comprehensive dataset that encompasses the diverse vocabulary of the Bangla language. In this study, we have proposed an English-to-Bangla MT system using an encoder-decoder attention model using the CCMatrix corpus. Our method shows that this model can outperform traditional SMT and RBMT models with a Bilingual Evaluation Understudy (BLEU) score of 15.68 despite being constrained by the limited vocabulary of the corpus. We hypothesize that this model can be used successfully for state-of-the-art machine translation with a more diverse and accurate dataset. This work can be extended further to incorporate several newer datasets using transfer learning techniques.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信