Hybrid Attention based Multimodal Network for Spoken Language Classification.

Proceedings of the conference. Association for Computational Linguistics. Meeting Pub Date : 2018-08-01

Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, Ivan Marsic

引用次数: 0

Abstract

We examine the utility of linguistic content and vocal characteristics for multimodal deep learning in human spoken language understanding. We present a deep multimodal network with both feature attention and modality attention to classify utterance-level speech data. The proposed hybrid attention architecture helps the system focus on learning informative representations for both modality-specific feature extraction and model fusion. The experimental results show that our system achieves state-of-the-art or competitive results on three published multimodal datasets. We also demonstrated the effectiveness and generalization of our system on a medical speech dataset from an actual trauma scenario. Furthermore, we provided a detailed comparison and analysis of traditional approaches and deep learning methods on both feature extraction and fusion.

Abstract Image

本刊更多论文

用于口语分类的基于注意力的多模式混合网络。

我们研究了语言内容和声音特征在人类口语理解中对多模式深度学习的效用。我们提出了一个同时具有特征注意和模态注意的深度多模态网络来对话语级语音数据进行分类。所提出的混合注意力架构有助于系统专注于学习用于模态特定特征提取和模型融合的信息表示。实验结果表明，我们的系统在三个已发表的多模态数据集上取得了最先进或有竞争力的结果。我们还在实际创伤场景的医学语音数据集上展示了我们系统的有效性和通用性。此外，我们对传统方法和深度学习方法在特征提取和融合方面进行了详细的比较和分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the conference. Association for Computational Linguistics. Meeting

自引率

0.00%

发文量