Enhanced Training Methods for Multiple Languages

Hai Li, Y. Li
{"title":"Enhanced Training Methods for Multiple Languages","authors":"Hai Li, Y. Li","doi":"10.18653/v1/2023.dialdoc-1.6","DOIUrl":null,"url":null,"abstract":"Document-grounded dialogue generation based on multilingual is a challenging and realistic task. Unlike previous tasks, it need to tackle with multiple high-resource languages facilitating low-resource languages. This paper summarizes our research based on a three-stage pipeline that includes retrieval, re-rank and generation where each component is individually optimized. In different languages with limited data scenarios, we mainly improve the robustness of the pipeline through data augmentation and embedding perturbation with purpose of improving the performance designing three training methods: cross-language enhancement training, weighted training with neighborhood distribution augmentation, and ensemble adversarial training, all of that can be used as plug and play modules. Through experiments with different settings, it has been shown that our methods can effectively improve the generalization performance of pipeline with score ranking 6th among the public submissions on leaderboards.","PeriodicalId":190893,"journal":{"name":"Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2023.dialdoc-1.6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Document-grounded dialogue generation based on multilingual is a challenging and realistic task. Unlike previous tasks, it need to tackle with multiple high-resource languages facilitating low-resource languages. This paper summarizes our research based on a three-stage pipeline that includes retrieval, re-rank and generation where each component is individually optimized. In different languages with limited data scenarios, we mainly improve the robustness of the pipeline through data augmentation and embedding perturbation with purpose of improving the performance designing three training methods: cross-language enhancement training, weighted training with neighborhood distribution augmentation, and ensemble adversarial training, all of that can be used as plug and play modules. Through experiments with different settings, it has been shown that our methods can effectively improve the generalization performance of pipeline with score ranking 6th among the public submissions on leaderboards.
多语言强化训练方法
基于多语言的基于文档的对话生成是一项具有挑战性和现实性的任务。与以前的任务不同,它需要解决多种高资源语言促进低资源语言的问题。本文总结了我们基于三个阶段的管道的研究,包括检索,重新排序和生成,其中每个组件都是单独优化的。在不同语言有限数据场景下,我们主要通过数据增强和嵌入扰动来提高管道的鲁棒性,以提高性能为目的,设计了跨语言增强训练、邻域分布增强加权训练和集成对抗训练三种训练方法,均可作为即插即用模块。通过不同设置的实验表明,我们的方法可以有效地提高流水线的泛化性能,在公开提交的排行榜中排名第六。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信