Enhanced Training Methods for Multiple Languages

Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering Pub Date : 1900-01-01 DOI:10.18653/v1/2023.dialdoc-1.6

Hai Li, Y. Li

引用次数: 0

Abstract

Document-grounded dialogue generation based on multilingual is a challenging and realistic task. Unlike previous tasks, it need to tackle with multiple high-resource languages facilitating low-resource languages. This paper summarizes our research based on a three-stage pipeline that includes retrieval, re-rank and generation where each component is individually optimized. In different languages with limited data scenarios, we mainly improve the robustness of the pipeline through data augmentation and embedding perturbation with purpose of improving the performance designing three training methods: cross-language enhancement training, weighted training with neighborhood distribution augmentation, and ensemble adversarial training, all of that can be used as plug and play modules. Through experiments with different settings, it has been shown that our methods can effectively improve the generalization performance of pipeline with score ranking 6th among the public submissions on leaderboards.

查看原文本刊更多论文

多语言强化训练方法

基于多语言的基于文档的对话生成是一项具有挑战性和现实性的任务。与以前的任务不同，它需要解决多种高资源语言促进低资源语言的问题。本文总结了我们基于三个阶段的管道的研究，包括检索，重新排序和生成，其中每个组件都是单独优化的。在不同语言有限数据场景下，我们主要通过数据增强和嵌入扰动来提高管道的鲁棒性，以提高性能为目的，设计了跨语言增强训练、邻域分布增强加权训练和集成对抗训练三种训练方法，均可作为即插即用模块。通过不同设置的实验表明，我们的方法可以有效地提高流水线的泛化性能，在公开提交的排行榜中排名第六。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

自引率

0.00%

发文量