Simple Data Augmentation for Multilingual NLU in Task Oriented Dialogue Systems

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI:10.4000/books.aaccademia.8648

Samuel Louvan, B. Magnini

引用次数: 5

Abstract

Data augmentation has shown potential in alleviating data scarcity for Natural Language Understanding (e.g. slot filling and intent classification) in task-oriented dialogue systems. As prior work has been mostly experimented on English datasets, we focus on five different languages, and consider a setting where limited data are available. We investigate the effectiveness of non-gradient based augmentation methods, involving simple text span substitutions and syntactic manipulations. Our experiments show that (i) augmentation is effective in all cases, particularly for slot filling; and (ii) it is beneficial for a joint intent-slot model based on multilingual BERT, both for limited data settings and when full training data is used.

查看原文本刊更多论文

面向任务对话系统中多语种NLU的简单数据增强

在面向任务的对话系统中，数据增强在缓解自然语言理解(例如槽填充和意图分类)的数据稀缺性方面显示出潜力。由于之前的工作主要是在英语数据集上进行实验，我们将重点放在五种不同的语言上，并考虑一个可用数据有限的设置。我们研究了非梯度增强方法的有效性，包括简单的文本跨度替换和语法操作。我们的实验表明(i)增强在所有情况下都是有效的，特别是对于槽填充;(ii)对于有限的数据设置和使用完整的训练数据时，基于多语言BERT的联合意向槽模型都是有益的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

自引率

0.00%

发文量