USST的AutoSimTrans系统2022

Proceedings of the Third Workshop on Automatic Simultaneous Translation Pub Date : 1900-01-01 DOI:10.18653/v1/2022.autosimtrans-1.7

Zhu Hui, Yu Jun

{"title":"USST的AutoSimTrans系统2022","authors":"Zhu Hui, Yu Jun","doi":"10.18653/v1/2022.autosimtrans-1.7","DOIUrl":null,"url":null,"abstract":"This paper describes our submitted text-to-text Simultaneous translation (ST) system, which won the second place in the Chinese→English streaming translation task of AutoSimTrans 2022. Our baseline system is a BPE-based Transformer model trained with the PaddlePaddle framework. In our experiments, we employ data synthesis and ensemble approaches to enhance the base model. In order to bridge the gap between general domain and spoken domain, we select in-domain data from general corpus and mixed then with spoken corpus for mixed fine tuning. Finally, we adopt fixed wait-k policy to transfer our full-sentence translation model to simultaneous translation model. Experiments on the development data show that our system outperforms than the baseline system.","PeriodicalId":444422,"journal":{"name":"Proceedings of the Third Workshop on Automatic Simultaneous Translation","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"USST’s System for AutoSimTrans 2022\",\"authors\":\"Zhu Hui, Yu Jun\",\"doi\":\"10.18653/v1/2022.autosimtrans-1.7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes our submitted text-to-text Simultaneous translation (ST) system, which won the second place in the Chinese→English streaming translation task of AutoSimTrans 2022. Our baseline system is a BPE-based Transformer model trained with the PaddlePaddle framework. In our experiments, we employ data synthesis and ensemble approaches to enhance the base model. In order to bridge the gap between general domain and spoken domain, we select in-domain data from general corpus and mixed then with spoken corpus for mixed fine tuning. Finally, we adopt fixed wait-k policy to transfer our full-sentence translation model to simultaneous translation model. Experiments on the development data show that our system outperforms than the baseline system.\",\"PeriodicalId\":444422,\"journal\":{\"name\":\"Proceedings of the Third Workshop on Automatic Simultaneous Translation\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Third Workshop on Automatic Simultaneous Translation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.autosimtrans-1.7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third Workshop on Automatic Simultaneous Translation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.autosimtrans-1.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文介绍了我们提交的文本到文本同声翻译(ST)系统，该系统在AutoSimTrans 2022中→英语流翻译任务中获得第二名。我们的基线系统是使用PaddlePaddle框架训练的基于bpe的Transformer模型。在我们的实验中，我们采用数据综合和集成方法来增强基础模型。为了弥补一般领域和语音领域之间的差距，我们从一般语料库中选择域内数据，并将其与语音语料库混合进行混合微调。最后，我们采用固定的wait-k策略将整句翻译模型转换为同声翻译模型。开发数据实验表明，系统性能优于基准系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

USST’s System for AutoSimTrans 2022

This paper describes our submitted text-to-text Simultaneous translation (ST) system, which won the second place in the Chinese→English streaming translation task of AutoSimTrans 2022. Our baseline system is a BPE-based Transformer model trained with the PaddlePaddle framework. In our experiments, we employ data synthesis and ensemble approaches to enhance the base model. In order to bridge the gap between general domain and spoken domain, we select in-domain data from general corpus and mixed then with spoken corpus for mixed fine tuning. Finally, we adopt fixed wait-k policy to transfer our full-sentence translation model to simultaneous translation model. Experiments on the development data show that our system outperforms than the baseline system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Third Workshop on Automatic Simultaneous Translation

自引率

0.00%

发文量