{"title":"Bridging the “gApp”: improving neural machine translation systems for multiword expression detection","authors":"Carlos Manuel Hidalgo-Ternero, G. C. Pastor","doi":"10.1515/phras-2020-0005","DOIUrl":null,"url":null,"abstract":"Abstract The present research introduces the tool gApp, a Python-based text preprocessing system for the automatic identification and conversion of discontinuous multiword expressions (MWEs) into their continuous form in order to enhance neural machine translation (NMT). To this end, an experiment with semi-fixed verb–noun idiomatic combinations (VNICs) will be carried out in order to evaluate to what extent gApp can optimise the performance of the two main free open-source NMT systems —Google Translate and DeepL— under the challenge of MWE discontinuity in the Spanish into English directionality. In the light of our promising results, the study concludes with suggestions on how to further optimise MWE-aware NMT systems.","PeriodicalId":41672,"journal":{"name":"Yearbook of Phraseology","volume":"11 1","pages":"61 - 80"},"PeriodicalIF":0.1000,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/phras-2020-0005","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Yearbook of Phraseology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/phras-2020-0005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 2
Abstract
Abstract The present research introduces the tool gApp, a Python-based text preprocessing system for the automatic identification and conversion of discontinuous multiword expressions (MWEs) into their continuous form in order to enhance neural machine translation (NMT). To this end, an experiment with semi-fixed verb–noun idiomatic combinations (VNICs) will be carried out in order to evaluate to what extent gApp can optimise the performance of the two main free open-source NMT systems —Google Translate and DeepL— under the challenge of MWE discontinuity in the Spanish into English directionality. In the light of our promising results, the study concludes with suggestions on how to further optimise MWE-aware NMT systems.
期刊介绍:
The Yearbook of Phraseology is a fully international, peer-reviewed publication dedicated to research in phraseology, a linguistic subfield concerned with the study of word combinations of varying extent and type, and different degrees of fixedness. Word combinations are ubiquitous in language and constitute a significant resource for communication. Their study is of interest to many other subdisciplines of linguistics and even to other disciplines, throwing light on the make-up of constructions, their processing and learning, the make-up and modes of creation of complex building blocks of language, the methodology and use of corpora and statistical methods, as well as on the way in which language functions.