将中越知识集成到统计机器翻译中,提高了Nôm文字自动转写成越南国家文字的能力

Lam H. Thai, Long H. B. Nguyen, Dinh Dien
{"title":"将中越知识集成到统计机器翻译中,提高了Nôm文字自动转写成越南国家文字的能力","authors":"Lam H. Thai, Long H. B. Nguyen, Dinh Dien","doi":"10.1145/3548636.3548647","DOIUrl":null,"url":null,"abstract":"Nôm scripts (chữ Nôm) are Vietnamese ancient scripts that were popularly used in Vietnam from the 10th century to the early 20th century. Nowadays, some automatic transliteration from Nôm scripts (NS) into Vietnamese National scripts (chữ Quốc ngữ - QN) systems were developed to help modern Vietnamese people acquire many valuable lessons and knowledge from previous generations through preserving the Sino-Nom heritage. However, these systems have still not performed well in many domains, except for Literature. Our research continues to employ Statistical Machine Translation (SMT) but expands the dataset up to 10 domains. Furthermore, we also focus on analyzing the impact of Chinese scripts with Sino-Vietnamese readings on Nôm script – National script and then integrating this knowledge into our transliteration model. Our experimental results show that our approach helps the model reach 94.04 BLEU score, dramatically increasing by 8.63 BLEU score in the genealogical domain and 0.31 BLEU score in the general model.","PeriodicalId":384376,"journal":{"name":"Proceedings of the 4th International Conference on Information Technology and Computer Communications","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improve the automatic transliteration from Nôm scripts into Vietnamese National scripts by integrating Sino – Vietnamese knowledge into Statistical Machine Translation\",\"authors\":\"Lam H. Thai, Long H. B. Nguyen, Dinh Dien\",\"doi\":\"10.1145/3548636.3548647\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nôm scripts (chữ Nôm) are Vietnamese ancient scripts that were popularly used in Vietnam from the 10th century to the early 20th century. Nowadays, some automatic transliteration from Nôm scripts (NS) into Vietnamese National scripts (chữ Quốc ngữ - QN) systems were developed to help modern Vietnamese people acquire many valuable lessons and knowledge from previous generations through preserving the Sino-Nom heritage. However, these systems have still not performed well in many domains, except for Literature. Our research continues to employ Statistical Machine Translation (SMT) but expands the dataset up to 10 domains. Furthermore, we also focus on analyzing the impact of Chinese scripts with Sino-Vietnamese readings on Nôm script – National script and then integrating this knowledge into our transliteration model. Our experimental results show that our approach helps the model reach 94.04 BLEU score, dramatically increasing by 8.63 BLEU score in the genealogical domain and 0.31 BLEU score in the general model.\",\"PeriodicalId\":384376,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Information Technology and Computer Communications\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Information Technology and Computer Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3548636.3548647\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Information Technology and Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3548636.3548647","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

Nôm文字(chnguyen Nôm)是越南古代文字,从10世纪到20世纪初在越南广泛使用。如今,一些将Nôm文字(NS)自动转写为越南国家文字(chnguyen Quốc ngnguyen - QN)的系统被开发出来,以帮助现代越南人通过保存汉nom遗产从前辈那里获得许多宝贵的经验和知识。然而,除了文学之外,这些系统在许多领域仍然表现不佳。我们的研究继续使用统计机器翻译(SMT),但将数据集扩展到10个域。此外,我们还重点分析了中越文字对Nôm script - National script的影响,然后将这些知识整合到我们的音译模型中。我们的实验结果表明,我们的方法使模型达到了94.04 BLEU分数,在家谱领域显著提高了8.63 BLEU分数,在一般模型中显著提高了0.31 BLEU分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improve the automatic transliteration from Nôm scripts into Vietnamese National scripts by integrating Sino – Vietnamese knowledge into Statistical Machine Translation
Nôm scripts (chữ Nôm) are Vietnamese ancient scripts that were popularly used in Vietnam from the 10th century to the early 20th century. Nowadays, some automatic transliteration from Nôm scripts (NS) into Vietnamese National scripts (chữ Quốc ngữ - QN) systems were developed to help modern Vietnamese people acquire many valuable lessons and knowledge from previous generations through preserving the Sino-Nom heritage. However, these systems have still not performed well in many domains, except for Literature. Our research continues to employ Statistical Machine Translation (SMT) but expands the dataset up to 10 domains. Furthermore, we also focus on analyzing the impact of Chinese scripts with Sino-Vietnamese readings on Nôm script – National script and then integrating this knowledge into our transliteration model. Our experimental results show that our approach helps the model reach 94.04 BLEU score, dramatically increasing by 8.63 BLEU score in the genealogical domain and 0.31 BLEU score in the general model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信