CMRUTU: Code Mixed Roman Urdu (Roman Urdu and English) to Urdu Translator

Muhammad Wisal, A. Mustafa, Umair Arshad
{"title":"CMRUTU: Code Mixed Roman Urdu (Roman Urdu and English) to Urdu Translator","authors":"Muhammad Wisal, A. Mustafa, Umair Arshad","doi":"10.1109/INMIC56986.2022.9972972","DOIUrl":null,"url":null,"abstract":"Urdu is the official language of Pakistan and a familiar language in the South Asian countries. It is spoken as the first language by nearly 70 million people and as a second language by more than 100 million people, mainly in Pakistan and India. Most of the textual communication is not pure Roman Urdu. There are words of actual English in between those Roman Urdu sentences. It is necessary to have a translator that can translate these code-mixed sentences into Urdu because the purpose of any language is to communicate. It can be difficult for a machine to understand the shift of languages in between a sentence. In the past, researchers have worked on Urdu transliteration and rule-based translation. However, a pure translation of mixed Roman Urdu to Urdu with such accuracy is novel. In this research, we have introduced Mixed Language (Roman Urdu and English) to the Urdu translator. A deep learning pre-trained model “g2p_multilingual_byT5_small” is fine-tuned with a newly created corpus of Mixed Roman Urdu sentences and their translations in pure Urdu. With a BLEU score of 66.73, It can translate text messages, paragraphs, or any descriptions from Roman Urdu to Urdu. We have carried out this research using Python programming language and the model training on Google Colab.","PeriodicalId":404424,"journal":{"name":"2022 24th International Multitopic Conference (INMIC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 24th International Multitopic Conference (INMIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INMIC56986.2022.9972972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Urdu is the official language of Pakistan and a familiar language in the South Asian countries. It is spoken as the first language by nearly 70 million people and as a second language by more than 100 million people, mainly in Pakistan and India. Most of the textual communication is not pure Roman Urdu. There are words of actual English in between those Roman Urdu sentences. It is necessary to have a translator that can translate these code-mixed sentences into Urdu because the purpose of any language is to communicate. It can be difficult for a machine to understand the shift of languages in between a sentence. In the past, researchers have worked on Urdu transliteration and rule-based translation. However, a pure translation of mixed Roman Urdu to Urdu with such accuracy is novel. In this research, we have introduced Mixed Language (Roman Urdu and English) to the Urdu translator. A deep learning pre-trained model “g2p_multilingual_byT5_small” is fine-tuned with a newly created corpus of Mixed Roman Urdu sentences and their translations in pure Urdu. With a BLEU score of 66.73, It can translate text messages, paragraphs, or any descriptions from Roman Urdu to Urdu. We have carried out this research using Python programming language and the model training on Google Colab.
代码混合罗马乌尔都语(罗马乌尔都语和英语)到乌尔都语翻译
乌尔都语是巴基斯坦的官方语言,也是南亚国家熟悉的语言。近7000万人将其作为第一语言,超过1亿人将其作为第二语言,主要是在巴基斯坦和印度。大多数文本交流不是纯粹的罗马乌尔都语。在那些罗马乌尔都语句子之间有一些真正的英语单词。有必要有一个译者,可以翻译这些代码混合的句子到乌尔都语,因为任何语言的目的是沟通。机器很难理解句子之间的语言转换。过去,研究人员对乌尔都语音译和基于规则的翻译进行了研究。然而,将混合罗马乌尔都语翻译成如此精确的乌尔都语是新颖的。在本研究中,我们将混合语言(罗马乌尔都语和英语)介绍给乌尔都语译者。深度学习预训练模型“g2p_multilingual_byT5_small”使用新创建的混合罗马乌尔都语句子语料库及其纯乌尔都语翻译进行微调。BLEU分数为66.73,它可以将文本信息,段落或任何描述从罗马乌尔都语翻译成乌尔都语。本研究采用Python编程语言,并在Google Colab上进行模型训练。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信