A Light Transformer For Speech-To-Intent Applications

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2020-11-24 DOI:10.1109/SLT48900.2021.9383559

Pu Wang, H. V. hamme

引用次数: 4

Abstract

Spoken language understanding (SLU) systems can make life more agreeable, safer (e.g. in a car) or can increase the independence of physically challenged users. However, due to the many sources of variation in speech, a well-trained system is hard to transfer to other conditions like a different language or to speech impaired users. A remedy is to design a user-taught SLU system that can learn fully from scratch from users’ demonstrations, which in turn requires that the system’s model quickly converges after only a few training samples. In this paper, we propose a light transformer structure by using a simplified relative position encoding with the goal to reduce the model size and improve efficiency. The light transformer works as an alternative speech encoder for an existing user-taught multitask SLU system. Experimental results on three datasets with challenging speech conditions prove our approach outperforms the existed system and other state-of-art models with half of the original model size and training time.

查看原文本刊更多论文

用于语音到意图应用的光变压器

口语理解(SLU)系统可以使生活更愉快、更安全(例如在汽车中)，或者可以提高身体残疾用户的独立性。然而，由于语音变化的来源很多，一个训练有素的系统很难转移到其他条件下，比如不同的语言或语言受损的用户。一种补救方法是设计一个由用户指导的SLU系统，该系统可以从用户的演示中完全从零开始学习，这反过来要求系统的模型在只有几个训练样本后迅速收敛。本文提出了一种采用简化相对位置编码的光变压器结构，目的是减小模型尺寸，提高效率。该光变压器可作为现有用户教的多任务SLU系统的替代语音编码器。在三个具有挑战性语音条件的数据集上的实验结果表明，我们的方法比现有系统和其他最先进的模型性能更好，并且模型大小和训练时间只有原始模型的一半。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量