Improving English-Arabic statistical machine translation with morpho-syntactic and semantic word class

Ines Turki Khemakhem, Salma Jamoussi, A. B. Hamadou
{"title":"Improving English-Arabic statistical machine translation with morpho-syntactic and semantic word class","authors":"Ines Turki Khemakhem, Salma Jamoussi, A. B. Hamadou","doi":"10.1504/ijista.2020.10029021","DOIUrl":null,"url":null,"abstract":"In this paper, we present a new method for the extraction and integrating of morpho-syntactic and semantic word classes in a statistical machine translation (SMT) context to improve the quality of English-Arabic translation. It can be applied across different statistical machine translations and with languages that have complicated morphological paradigms. In our method, we first identify morpho-syntactic word classes to build up our statistical language model. Then, we apply a semantic word clustering algorithm for English. The obtained semantic word classes are projected from the English side to the featured Arabic side. This projection is based on available word alignment provided by the alignment step using GIZA++ tool. Finally, we apply a new process to incorporate semantic classes in order to improve the SMT quality. We show its efficacy on small and larger English to Arabic translation tasks. The experimental results show that introducing morpho-syntactic and semantic word classes achieves 7.7% of relative improvement on the BLEU score.","PeriodicalId":420808,"journal":{"name":"Int. J. Intell. Syst. Technol. Appl.","volume":"35 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Intell. Syst. Technol. Appl.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijista.2020.10029021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this paper, we present a new method for the extraction and integrating of morpho-syntactic and semantic word classes in a statistical machine translation (SMT) context to improve the quality of English-Arabic translation. It can be applied across different statistical machine translations and with languages that have complicated morphological paradigms. In our method, we first identify morpho-syntactic word classes to build up our statistical language model. Then, we apply a semantic word clustering algorithm for English. The obtained semantic word classes are projected from the English side to the featured Arabic side. This projection is based on available word alignment provided by the alignment step using GIZA++ tool. Finally, we apply a new process to incorporate semantic classes in order to improve the SMT quality. We show its efficacy on small and larger English to Arabic translation tasks. The experimental results show that introducing morpho-syntactic and semantic word classes achieves 7.7% of relative improvement on the BLEU score.
基于词法句法和语义词类的英阿统计机器翻译改进
本文提出了一种在统计机器翻译(SMT)环境中提取词法类和语义类的新方法,以提高英语-阿拉伯语翻译的质量。它可以应用于不同的统计机器翻译和具有复杂形态范式的语言。在我们的方法中,我们首先识别词法语法类来建立我们的统计语言模型。然后,我们应用了一种针对英语的语义词聚类算法。获得的语义词类从英语面投射到阿拉伯语特征面。此投影基于使用giz++工具的对齐步骤提供的可用单词对齐。最后,为了提高SMT的质量,我们采用了一个新的过程来合并语义类。我们在小型和大型英语到阿拉伯语的翻译任务上展示了它的有效性。实验结果表明,引入词法类和语义类可以使BLEU得分相对提高7.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信