Recognition of Emotion in Speech-related Audio Files with LSTM-Transformer

Felicia Andayani, Lau Bee Theng, Mark Tee Kit Tsun, C. Chua
{"title":"Recognition of Emotion in Speech-related Audio Files with LSTM-Transformer","authors":"Felicia Andayani, Lau Bee Theng, Mark Tee Kit Tsun, C. Chua","doi":"10.1109/icci54321.2022.9756100","DOIUrl":null,"url":null,"abstract":"In our everyday audio events, there is some emotional information in almost any speech audio received by humans. Thus, Speech Emotion Recognition (SER) has become an important research field in the last decade. SER recognizes human emotional states through human speech or daily conversation. It plays a crucial role in developing Human-Computer Interaction (HCI) and signals processing systems. Moreover, human emotions change naturally over time. Thus, it requires a good model for learning the long-term dependencies in the speech signal. In this paper, a hybrid model which combines two widely used deep learning methods is proposed. The proposed model combines the Long-Short Term Memory (LSTM) and Transformer architectures to learn the long-term dependencies through the extracted Mel Frequency Cepstral Coefficient (MFCC) features. The preliminary results of the proposed model evaluated on the publicly available dataset called RAVDESS are presented. The model achieved 75.33% of weighted accuracy (WA) and 73.12% of unweighted accuracy (UA) over the RAVDESS dataset. The experiment's result indicates the effectiveness of the proposed model in learning the temporal information from the frequency distributions according to the MFCC features.","PeriodicalId":122550,"journal":{"name":"2022 5th International Conference on Computing and Informatics (ICCI)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Computing and Informatics (ICCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icci54321.2022.9756100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In our everyday audio events, there is some emotional information in almost any speech audio received by humans. Thus, Speech Emotion Recognition (SER) has become an important research field in the last decade. SER recognizes human emotional states through human speech or daily conversation. It plays a crucial role in developing Human-Computer Interaction (HCI) and signals processing systems. Moreover, human emotions change naturally over time. Thus, it requires a good model for learning the long-term dependencies in the speech signal. In this paper, a hybrid model which combines two widely used deep learning methods is proposed. The proposed model combines the Long-Short Term Memory (LSTM) and Transformer architectures to learn the long-term dependencies through the extracted Mel Frequency Cepstral Coefficient (MFCC) features. The preliminary results of the proposed model evaluated on the publicly available dataset called RAVDESS are presented. The model achieved 75.33% of weighted accuracy (WA) and 73.12% of unweighted accuracy (UA) over the RAVDESS dataset. The experiment's result indicates the effectiveness of the proposed model in learning the temporal information from the frequency distributions according to the MFCC features.
基于LSTM-Transformer的语音音频文件情感识别
在我们的日常音频事件中,人类接收到的几乎任何语音音频都含有一些情感信息。因此,语音情感识别(SER)在近十年来成为一个重要的研究领域。SER通过人类的语言或日常对话来识别人类的情绪状态。它在开发人机交互(HCI)和信号处理系统中起着至关重要的作用。此外,人类的情绪会随着时间自然变化。因此,需要一个好的模型来学习语音信号中的长期依赖关系。本文提出了一种结合两种广泛使用的深度学习方法的混合模型。该模型结合了长短期记忆(LSTM)和变压器(Transformer)架构,通过提取Mel频率倒谱系数(MFCC)特征来学习长期依赖关系。本文给出了在RAVDESS公开数据集上评估该模型的初步结果。该模型在RAVDESS数据集上获得了75.33%的加权精度(WA)和73.12%的非加权精度(UA)。实验结果表明,该模型能够根据MFCC特征从频率分布中学习时间信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信