基于神经注意的英语到孟加拉语翻译编码器-解码器方法

Comput. Sci. J. Moldova Pub Date : 2023-04-01 DOI:10.56415/csjm.v31.04

Abdullah Al Shiam, S. M. Redwan, Humaun Kabir, Jungpil Shin

{"title":"基于神经注意的英语到孟加拉语翻译编码器-解码器方法","authors":"Abdullah Al Shiam, S. M. Redwan, Humaun Kabir, Jungpil Shin","doi":"10.56415/csjm.v31.04","DOIUrl":null,"url":null,"abstract":"Machine translation (MT) is the process of translating text from one language to another using bilingual data sets and grammatical rules. Recent works in the field of MT have popularized sequence-to-sequence models leveraging neural attention and deep learning. The success of neural attention models is yet to be construed into a robust framework for automated English-to-Bangla translation due to a lack of a comprehensive dataset that encompasses the diverse vocabulary of the Bangla language. In this study, we have proposed an English-to-Bangla MT system using an encoder-decoder attention model using the CCMatrix corpus. Our method shows that this model can outperform traditional SMT and RBMT models with a Bilingual Evaluation Understudy (BLEU) score of 15.68 despite being constrained by the limited vocabulary of the corpus. We hypothesize that this model can be used successfully for state-of-the-art machine translation with a more diverse and accurate dataset. This work can be extended further to incorporate several newer datasets using transfer learning techniques.\n","PeriodicalId":262087,"journal":{"name":"Comput. Sci. J. Moldova","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Neural Attention-Based Encoder-Decoder Approach for English to Bangla Translation\",\"authors\":\"Abdullah Al Shiam, S. M. Redwan, Humaun Kabir, Jungpil Shin\",\"doi\":\"10.56415/csjm.v31.04\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine translation (MT) is the process of translating text from one language to another using bilingual data sets and grammatical rules. Recent works in the field of MT have popularized sequence-to-sequence models leveraging neural attention and deep learning. The success of neural attention models is yet to be construed into a robust framework for automated English-to-Bangla translation due to a lack of a comprehensive dataset that encompasses the diverse vocabulary of the Bangla language. In this study, we have proposed an English-to-Bangla MT system using an encoder-decoder attention model using the CCMatrix corpus. Our method shows that this model can outperform traditional SMT and RBMT models with a Bilingual Evaluation Understudy (BLEU) score of 15.68 despite being constrained by the limited vocabulary of the corpus. We hypothesize that this model can be used successfully for state-of-the-art machine translation with a more diverse and accurate dataset. This work can be extended further to incorporate several newer datasets using transfer learning techniques.\\n\",\"PeriodicalId\":262087,\"journal\":{\"name\":\"Comput. Sci. J. Moldova\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Comput. Sci. J. Moldova\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.56415/csjm.v31.04\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comput. Sci. J. Moldova","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56415/csjm.v31.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

机器翻译是使用双语数据集和语法规则将文本从一种语言翻译成另一种语言的过程。最近在机器翻译领域的工作推广了利用神经注意和深度学习的序列到序列模型。由于缺乏包含孟加拉语多种词汇的综合数据集，神经注意模型的成功尚未被解释为一个强大的自动英语到孟加拉语翻译框架。在这项研究中，我们提出了一个使用CCMatrix语料库的编码器-解码器注意模型的英语-孟加拉语机器翻译系统。我们的方法表明，尽管受到语料库有限词汇量的限制，该模型仍能以15.68分的双语评价替补(BLEU)分数优于传统的SMT和RBMT模型。我们假设该模型可以成功地用于具有更多样化和更准确数据集的最先进的机器翻译。这项工作可以进一步扩展到使用迁移学习技术合并几个更新的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Neural Attention-Based Encoder-Decoder Approach for English to Bangla Translation

Machine translation (MT) is the process of translating text from one language to another using bilingual data sets and grammatical rules. Recent works in the field of MT have popularized sequence-to-sequence models leveraging neural attention and deep learning. The success of neural attention models is yet to be construed into a robust framework for automated English-to-Bangla translation due to a lack of a comprehensive dataset that encompasses the diverse vocabulary of the Bangla language. In this study, we have proposed an English-to-Bangla MT system using an encoder-decoder attention model using the CCMatrix corpus. Our method shows that this model can outperform traditional SMT and RBMT models with a Bilingual Evaluation Understudy (BLEU) score of 15.68 despite being constrained by the limited vocabulary of the corpus. We hypothesize that this model can be used successfully for state-of-the-art machine translation with a more diverse and accurate dataset. This work can be extended further to incorporate several newer datasets using transfer learning techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Comput. Sci. J. Moldova

自引率

0.00%

发文量