结合门控卷积网络和自注意机制的语音情绪识别

2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) Pub Date : 2019-09-01 DOI:10.1109/ACIIW.2019.8925283

C. Li, Jinlong Jiao, Yiqin Zhao, Ziping Zhao

{"title":"结合门控卷积网络和自注意机制的语音情绪识别","authors":"C. Li, Jinlong Jiao, Yiqin Zhao, Ziping Zhao","doi":"10.1109/ACIIW.2019.8925283","DOIUrl":null,"url":null,"abstract":"Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. The predominant approach to SER to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we introduce new gated convolutional networks and apply them to SER, which can be more efficient since they allow parallelization over sequential tokens. We present a novel model architecture that incorporates a gated convolutional neural network and a temporal attention-based localization method for speech emotion recognition. To the best of the authors' knowledge, this is the first time that such a hybrid architecture is employed for SER. We demonstrate the effectiveness of our approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus. The experimental results demonstrate that our proposed model outperforms current state-of-the-art approaches.","PeriodicalId":193568,"journal":{"name":"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Combining Gated Convolutional Networks and Self-Attention Mechanism for Speech Emotion Recognition\",\"authors\":\"C. Li, Jinlong Jiao, Yiqin Zhao, Ziping Zhao\",\"doi\":\"10.1109/ACIIW.2019.8925283\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. The predominant approach to SER to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we introduce new gated convolutional networks and apply them to SER, which can be more efficient since they allow parallelization over sequential tokens. We present a novel model architecture that incorporates a gated convolutional neural network and a temporal attention-based localization method for speech emotion recognition. To the best of the authors' knowledge, this is the first time that such a hybrid architecture is employed for SER. We demonstrate the effectiveness of our approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus. The experimental results demonstrate that our proposed model outperforms current state-of-the-art approaches.\",\"PeriodicalId\":193568,\"journal\":{\"name\":\"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACIIW.2019.8925283\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACIIW.2019.8925283","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

离散语音情感识别(SER)是将单个情感标签分配到整个语音，通常作为序列到标签的任务来执行。迄今为止，主要的SER方法是基于递归神经网络。他们在这项任务上的成功通常与他们捕捉无限上下文的能力有关。在本文中，我们引入了新的门控卷积网络并将其应用于SER，由于它们允许对顺序令牌进行并行化，因此可以提高效率。我们提出了一种新的模型架构，该架构结合了门控卷积神经网络和基于时间注意力的语音情感识别定位方法。据作者所知，这是第一次将这种混合体系结构用于SER。我们证明了我们的方法在交互式情感二元动作捕捉(IEMOCAP)语料库上的有效性。实验结果表明，我们提出的模型优于目前最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Combining Gated Convolutional Networks and Self-Attention Mechanism for Speech Emotion Recognition

Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. The predominant approach to SER to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we introduce new gated convolutional networks and apply them to SER, which can be more efficient since they allow parallelization over sequential tokens. We present a novel model architecture that incorporates a gated convolutional neural network and a temporal attention-based localization method for speech emotion recognition. To the best of the authors' knowledge, this is the first time that such a hybrid architecture is employed for SER. We demonstrate the effectiveness of our approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus. The experimental results demonstrate that our proposed model outperforms current state-of-the-art approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)

自引率

0.00%

发文量