Conference of the Association for Machine Translation in the Americas最新文献

筛选
英文 中文
Domain-Specific Text Generation for Machine Translation 用于机器翻译的特定领域文本生成
Conference of the Association for Machine Translation in the Americas Pub Date : 2022-08-11 DOI: 10.48550/arXiv.2208.05909
Yasmin Moslem, Rejwanul Haque, John D. Kelleher, Andy Way
{"title":"Domain-Specific Text Generation for Machine Translation","authors":"Yasmin Moslem, Rejwanul Haque, John D. Kelleher, Andy Way","doi":"10.48550/arXiv.2208.05909","DOIUrl":"https://doi.org/10.48550/arXiv.2208.05909","url":null,"abstract":"Preservation of domain knowledge from the source to target is crucial in any translation workflow. It is common in the translation industry to receive highly-specialized projects, where there is hardly any parallel in-domain data. In such scenarios where there is insufficient in-domain data to fine-tune Machine Translation (MT) models, producing translations that are consistent with the relevant context is challenging. In this work, we propose leveraging state-of-the-art pretrained language models (LMs) for domain-specific data augmentation for MT, simulating the domain characteristics of either (a) a small bilingual dataset, or (b) the monolingual source text to be translated. Combining this idea with back-translation, we can generate huge amounts of synthetic bilingual in-domain data for both use cases. For our investigation, we used the state-of-the-art MT architecture, Transformer. We employed mixed fine-tuning to train models that significantly improve translation of in-domain texts. More specifically, our proposed methods achieved improvements of approximately 5-6 BLEU and 2-3 BLEU, respectively, on Arabic-to-English and English-to-Arabic language pairs. Furthermore, the outcome of human evaluation corroborates the automatic evaluation results.","PeriodicalId":201231,"journal":{"name":"Conference of the Association for Machine Translation in the Americas","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114757328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation? 字节对编码在神经机器翻译中对词汇外词的处理效果如何?
Conference of the Association for Machine Translation in the Americas Pub Date : 2022-08-10 DOI: 10.48550/arXiv.2208.05225
Ali Araabi, Christof Monz, Vlad Niculae
{"title":"How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation?","authors":"Ali Araabi, Christof Monz, Vlad Niculae","doi":"10.48550/arXiv.2208.05225","DOIUrl":"https://doi.org/10.48550/arXiv.2208.05225","url":null,"abstract":"Neural Machine Translation (NMT) is an open vocabulary problem. As a result, dealing with the words not occurring during training (a.k.a. out-of-vocabulary (OOV) words) have long been a fundamental challenge for NMT systems. The predominant method to tackle this problem is Byte Pair Encoding (BPE) which splits words, including OOV words, into sub-word segments. BPE has achieved impressive results for a wide range of translation tasks in terms of automatic evaluation metrics. While it is often assumed that by using BPE, NMT systems are capable of handling OOV words, the effectiveness of BPE in translating OOV words has not been explicitly measured. In this paper, we study to what extent BPE is successful in translating OOV words at the word-level. We analyze the translation quality of OOV words based on word type, number of segments, cross-attention weights, and the frequency of segment n-grams in the training data. Our experiments show that while careful BPE settings seem to be fairly useful in translating OOV words across datasets, a considerable percentage of OOV words are translated incorrectly. Furthermore, we highlight the slightly higher effectiveness of BPE in translating OOV words for special cases, such as named-entities and when the languages involved are linguistically close to each other.","PeriodicalId":201231,"journal":{"name":"Conference of the Association for Machine Translation in the Americas","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131492892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Consistent Human Evaluation of Machine Translation across Language Pairs 跨语言对机器翻译的一致性人类评价
Conference of the Association for Machine Translation in the Americas Pub Date : 2022-05-17 DOI: 10.48550/arXiv.2205.08533
Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzmán, Mona T. Diab, Philipp Koehn
{"title":"Consistent Human Evaluation of Machine Translation across Language Pairs","authors":"Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzmán, Mona T. Diab, Philipp Koehn","doi":"10.48550/arXiv.2205.08533","DOIUrl":"https://doi.org/10.48550/arXiv.2205.08533","url":null,"abstract":"Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs. We propose a new metric called XSTS that is more focused on semantic equivalence and a cross-lingual calibration method that enables more consistent assessment. We demonstrate the effectiveness of these novel contributions in large scale evaluation studies across up to 14 language pairs, with translation both into and out of English.","PeriodicalId":201231,"journal":{"name":"Conference of the Association for Machine Translation in the Americas","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129088555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training? 在多语言标记器训练中,神经机器翻译对语言失衡的鲁棒性如何?
Conference of the Association for Machine Translation in the Americas Pub Date : 2022-04-29 DOI: 10.48550/arXiv.2204.14268
Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzmán
{"title":"How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?","authors":"Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzmán","doi":"10.48550/arXiv.2204.14268","DOIUrl":"https://doi.org/10.48550/arXiv.2204.14268","url":null,"abstract":"A multilingual tokenizer is a fundamental component of multilingual neural machine translation. It is trained from a multilingual corpus. Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus. However, few works have systematically answered how language imbalance in tokenizer training affects downstream performance. In this work, we analyze how translation performance changes as the data ratios among languages vary in the tokenizer training corpus. We find that while relatively better performance is often observed when languages are more equally sampled, the downstream performance is more robust to language imbalance than we usually expected. Two features, UNK rate and closeness to the character level, can warn of poor downstream performance before performing the task. We also distinguish language sampling for tokenizer training from sampling for model training and show that the model is more sensitive to the latter.","PeriodicalId":201231,"journal":{"name":"Conference of the Association for Machine Translation in the Americas","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114325994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation 为什么不做到多才多艺呢?SGNMT解码器在机器翻译中的应用
Conference of the Association for Machine Translation in the Americas Pub Date : 2018-03-17 DOI: 10.17863/CAM.25872
Felix Stahlberg, Danielle Saunders, Gonzalo Iglesias, B. Byrne
{"title":"Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation","authors":"Felix Stahlberg, Danielle Saunders, Gonzalo Iglesias, B. Byrne","doi":"10.17863/CAM.25872","DOIUrl":"https://doi.org/10.17863/CAM.25872","url":null,"abstract":"SGNMT is a decoding platform for machine translation which allows paring various modern neural models of translation with different kinds of constraints and symbolic models. In this paper, we describe three use cases in which SGNMT is currently playing an active role: (1) teaching as SGNMT is being used for course work and student theses in the MPhil in Machine Learning, Speech and Language Technology at the University of Cambridge, (2) research as most of the research work of the Cambridge MT group is based on SGNMT, and (3) technology transfer as we show how SGNMT is helping to transfer research findings from the laboratory to the industry, eg. into a product of SDL plc.","PeriodicalId":201231,"journal":{"name":"Conference of the Association for Machine Translation in the Americas","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126403205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Sharing User Dictionaries Across Multiple Systems with UTX-S 使用UTX-S跨多个系统共享用户字典
Conference of the Association for Machine Translation in the Americas Pub Date : 2009-02-20 DOI: 10.1145/1499224.1499247
Francis Bond, Seiji Okura, Yu Yamamoto, Toshiki Murata, Kiyotaka Uchimoto, Michael Kato, Miwako Shimazu, Tsugiyoshi Suzuki
{"title":"Sharing User Dictionaries Across Multiple Systems with UTX-S","authors":"Francis Bond, Seiji Okura, Yu Yamamoto, Toshiki Murata, Kiyotaka Uchimoto, Michael Kato, Miwako Shimazu, Tsugiyoshi Suzuki","doi":"10.1145/1499224.1499247","DOIUrl":"https://doi.org/10.1145/1499224.1499247","url":null,"abstract":"Careful tuning of user-created dictionaries is indispensable when using a machine translation system for computer aided translation. However, there is no widely used standard for user dictionaries in the Japanese/English machine translation market. To address this issue, AAMT (the Asia-Pacific Association for Machine Translation) has established a specification of sharable dictionaries (UTX-S: Universal Terminology eXchange -- Simple), which can be used across different machine translation systems, thus increasing the interoperability of language resources. UTX-S is simpler than existing specifications such as UPF and OLIF. It was explicitly designed to make it easy to (a) add new user dictionaries and (b) share existing user dictionaries. This facilitates rapid user dictionary production and avoids vendor tie in. In this study we describe the UTX-Simple (UTX-S) format, and show that it can be converted to the user dictionary formats for five commercial English-Japanese MT systems. We then present a case study where we (a) convert an on-line glossary to UTX-S, and (b) produce user dictionaries for five different systems, and then exchange them. The results show that the simplified format of UTX-S can be used to rapidly build dictionaries. Further, we confirm that customized user dictionaries are effective across systems, although with a slight loss in quality: on average, user dictionaries improved the translations for 44.8% of translations with the systems they were built for and 37.3% of translations for different systems. In ongoing work, AAMT is using UTX-S as the format in building up a user community for producing, sharing, and accumulating user dictionaries in a sustainable way.","PeriodicalId":201231,"journal":{"name":"Conference of the Association for Machine Translation in the Americas","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117132045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A super-function based Japanese-Chinese machine translation system for business users 一个为商业用户提供的基于超功能的日中机器翻译系统
Conference of the Association for Machine Translation in the Americas Pub Date : 2004-09-28 DOI: 10.1007/978-3-540-30194-3_30
Xin Zhao, F. Ren, S. Voß
{"title":"A super-function based Japanese-Chinese machine translation system for business users","authors":"Xin Zhao, F. Ren, S. Voß","doi":"10.1007/978-3-540-30194-3_30","DOIUrl":"https://doi.org/10.1007/978-3-540-30194-3_30","url":null,"abstract":"","PeriodicalId":201231,"journal":{"name":"Conference of the Association for Machine Translation in the Americas","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117162629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A structurally diverse minimal corpus for eliciting structural mappings between languages 一个结构多样的最小语料库,用于引出语言之间的结构映射
Conference of the Association for Machine Translation in the Americas Pub Date : 2004-09-28 DOI: 10.1007/978-3-540-30194-3_24
Katharina Probst, A. Lavie
{"title":"A structurally diverse minimal corpus for eliciting structural mappings between languages","authors":"Katharina Probst, A. Lavie","doi":"10.1007/978-3-540-30194-3_24","DOIUrl":"https://doi.org/10.1007/978-3-540-30194-3_24","url":null,"abstract":"","PeriodicalId":201231,"journal":{"name":"Conference of the Association for Machine Translation in the Americas","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125559133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Machine translation of online product support articles using data-driven MT system 使用数据驱动的机器翻译系统进行在线产品支持文章的机器翻译
Conference of the Association for Machine Translation in the Americas Pub Date : 2004-09-28 DOI: 10.1007/978-3-540-30194-3_27
Stephen D. Richardson
{"title":"Machine translation of online product support articles using data-driven MT system","authors":"Stephen D. Richardson","doi":"10.1007/978-3-540-30194-3_27","DOIUrl":"https://doi.org/10.1007/978-3-540-30194-3_27","url":null,"abstract":"","PeriodicalId":201231,"journal":{"name":"Conference of the Association for Machine Translation in the Americas","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115936116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Investigation of intelligibility judgments 可理解性判决的调查
Conference of the Association for Machine Translation in the Americas Pub Date : 2004-09-28 DOI: 10.1007/978-3-540-30194-3_25
F. Reeder
{"title":"Investigation of intelligibility judgments","authors":"F. Reeder","doi":"10.1007/978-3-540-30194-3_25","DOIUrl":"https://doi.org/10.1007/978-3-540-30194-3_25","url":null,"abstract":"","PeriodicalId":201231,"journal":{"name":"Conference of the Association for Machine Translation in the Americas","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134628549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信