结合门控卷积网络和自注意机制的语音情绪识别

C. Li, Jinlong Jiao, Yiqin Zhao, Ziping Zhao
{"title":"结合门控卷积网络和自注意机制的语音情绪识别","authors":"C. Li, Jinlong Jiao, Yiqin Zhao, Ziping Zhao","doi":"10.1109/ACIIW.2019.8925283","DOIUrl":null,"url":null,"abstract":"Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. The predominant approach to SER to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we introduce new gated convolutional networks and apply them to SER, which can be more efficient since they allow parallelization over sequential tokens. We present a novel model architecture that incorporates a gated convolutional neural network and a temporal attention-based localization method for speech emotion recognition. To the best of the authors' knowledge, this is the first time that such a hybrid architecture is employed for SER. We demonstrate the effectiveness of our approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus. The experimental results demonstrate that our proposed model outperforms current state-of-the-art approaches.","PeriodicalId":193568,"journal":{"name":"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Combining Gated Convolutional Networks and Self-Attention Mechanism for Speech Emotion Recognition\",\"authors\":\"C. Li, Jinlong Jiao, Yiqin Zhao, Ziping Zhao\",\"doi\":\"10.1109/ACIIW.2019.8925283\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. The predominant approach to SER to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we introduce new gated convolutional networks and apply them to SER, which can be more efficient since they allow parallelization over sequential tokens. We present a novel model architecture that incorporates a gated convolutional neural network and a temporal attention-based localization method for speech emotion recognition. To the best of the authors' knowledge, this is the first time that such a hybrid architecture is employed for SER. We demonstrate the effectiveness of our approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus. The experimental results demonstrate that our proposed model outperforms current state-of-the-art approaches.\",\"PeriodicalId\":193568,\"journal\":{\"name\":\"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACIIW.2019.8925283\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACIIW.2019.8925283","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

离散语音情感识别(SER)是将单个情感标签分配到整个语音,通常作为序列到标签的任务来执行。迄今为止,主要的SER方法是基于递归神经网络。他们在这项任务上的成功通常与他们捕捉无限上下文的能力有关。在本文中,我们引入了新的门控卷积网络并将其应用于SER,由于它们允许对顺序令牌进行并行化,因此可以提高效率。我们提出了一种新的模型架构,该架构结合了门控卷积神经网络和基于时间注意力的语音情感识别定位方法。据作者所知,这是第一次将这种混合体系结构用于SER。我们证明了我们的方法在交互式情感二元动作捕捉(IEMOCAP)语料库上的有效性。实验结果表明,我们提出的模型优于目前最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Combining Gated Convolutional Networks and Self-Attention Mechanism for Speech Emotion Recognition
Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. The predominant approach to SER to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we introduce new gated convolutional networks and apply them to SER, which can be more efficient since they allow parallelization over sequential tokens. We present a novel model architecture that incorporates a gated convolutional neural network and a temporal attention-based localization method for speech emotion recognition. To the best of the authors' knowledge, this is the first time that such a hybrid architecture is employed for SER. We demonstrate the effectiveness of our approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus. The experimental results demonstrate that our proposed model outperforms current state-of-the-art approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信