{"title":"Enhancing Machine Translation by Integrating Linguistic Knowledge in the Word Alignment Module","authors":"Safae Berrichi, A. Mazroui","doi":"10.1109/ISCV49265.2020.9204328","DOIUrl":null,"url":null,"abstract":"The word alignment process, which is a critical step in statistical translation systems (SMT), has been suggested by several researchers as a promising track for enhancing neural translation system (NMT) performance in low-resource environments. Furthermore, given the negative impact on English/Arabic machine translation quality arising from the morphological richness and complexity of the Arabic language compared to the English language, we assessed in this study the relevance of the integration of morphosyntactic characteristics during the alignment phase. Indeed, we have enriched parallel corpora by morphosyntactic features such as stems, lemmas, roots, and POS tags; yet we have developed new SMT systems embedding one of these features in the word alignment phase. The test results proved the interest to use these features and highlighted the most relevant morphosyntactic information to the translation system.","PeriodicalId":313743,"journal":{"name":"2020 International Conference on Intelligent Systems and Computer Vision (ISCV)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Intelligent Systems and Computer Vision (ISCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCV49265.2020.9204328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The word alignment process, which is a critical step in statistical translation systems (SMT), has been suggested by several researchers as a promising track for enhancing neural translation system (NMT) performance in low-resource environments. Furthermore, given the negative impact on English/Arabic machine translation quality arising from the morphological richness and complexity of the Arabic language compared to the English language, we assessed in this study the relevance of the integration of morphosyntactic characteristics during the alignment phase. Indeed, we have enriched parallel corpora by morphosyntactic features such as stems, lemmas, roots, and POS tags; yet we have developed new SMT systems embedding one of these features in the word alignment phase. The test results proved the interest to use these features and highlighted the most relevant morphosyntactic information to the translation system.