Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network

IF 2.7 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Nhat Truong Pham, Sy Dzung Nguyen, Vu Song Thuy Nguyen, Bich Ngoc Hong Pham, Duc Ngoc Minh Dang
{"title":"Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network","authors":"Nhat Truong Pham, Sy Dzung Nguyen, Vu Song Thuy Nguyen, Bich Ngoc Hong Pham, Duc Ngoc Minh Dang","doi":"10.1080/24751839.2023.2187278","DOIUrl":null,"url":null,"abstract":"ABSTRACT Speech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, classifier schemes, and computational costs. To address the aforementioned problems, we propose a new set of 1D features extracted by using an overlapping sliding window (OSW) technique for SER in this study. In addition, a deep neural network-based classifier scheme called the deep Pattern Recognition Network (PRN) is designed to categorize emotional states from the new set of 1D features. We evaluate the proposed method on the Emo-DB and the AESSD datasets that contain several different emotional states. The experimental results show that the proposed method achieves an accuracy of 98.5% and 87.1% on the Emo-DB and AESSD datasets, respectively. It is also more comparable with accuracy to and better than the state-of-the-art and current approaches that use 1D features on the same datasets for SER. Furthermore, the SHAP (SHapley Additive exPlanations) analysis is employed for interpreting the prediction model to assist system developers in selecting the optimal features to integrate into the desired system.","PeriodicalId":32180,"journal":{"name":"Journal of Information and Telecommunication","volume":"7 1","pages":"317 - 335"},"PeriodicalIF":2.7000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information and Telecommunication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/24751839.2023.2187278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2

Abstract

ABSTRACT Speech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, classifier schemes, and computational costs. To address the aforementioned problems, we propose a new set of 1D features extracted by using an overlapping sliding window (OSW) technique for SER in this study. In addition, a deep neural network-based classifier scheme called the deep Pattern Recognition Network (PRN) is designed to categorize emotional states from the new set of 1D features. We evaluate the proposed method on the Emo-DB and the AESSD datasets that contain several different emotional states. The experimental results show that the proposed method achieves an accuracy of 98.5% and 87.1% on the Emo-DB and AESSD datasets, respectively. It is also more comparable with accuracy to and better than the state-of-the-art and current approaches that use 1D features on the same datasets for SER. Furthermore, the SHAP (SHapley Additive exPlanations) analysis is employed for interpreting the prediction model to assist system developers in selecting the optimal features to integrate into the desired system.
基于重叠滑动窗和Shapley加性可解释深度神经网络的语音情感识别
摘要语音情感识别(SER)具有多种应用,如电子学习、人机交互、客户服务和医疗保健系统。尽管研究人员已经研究了许多提高SER准确性的技术,但在特征提取、分类器方案和计算成本方面一直存在挑战。为了解决上述问题,我们在本研究中提出了一组新的1D特征,通过使用重叠滑动窗口(OSW)技术提取SER。此外,还设计了一种基于深度神经网络的分类器方案,称为深度模式识别网络(PRN),用于从新的1D特征集中对情绪状态进行分类。我们在包含几种不同情绪状态的Emo DB和AESSD数据集上评估了所提出的方法。实验结果表明,该方法在Emo-DB和AESSD数据集上的准确率分别为98.5%和87.1%。它在精度上也比在SER的相同数据集上使用1D特征的最先进和当前方法更具可比性,并且更好。此外,SHAP(SHapley Additive exPlanations)分析被用于解释预测模型,以帮助系统开发人员选择最佳特征以集成到期望的系统中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
0.00%
发文量
18
审稿时长
27 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信