A Web-Based Platform for Real-Time Speech Emotion Recognition using CNN

2023 International Conference on Smart Applications, Communications and Networking (SmartNets) Pub Date : 2023-07-25 DOI:10.1109/SmartNets58706.2023.10215937

Damir Kabdualiyev, Askar Madiyev, Adil Rakhaliyev, Balgynbek Dikhan, Kassymzhan Gizhduan, Hashim Ali

{"title":"A Web-Based Platform for Real-Time Speech Emotion Recognition using CNN","authors":"Damir Kabdualiyev, Askar Madiyev, Adil Rakhaliyev, Balgynbek Dikhan, Kassymzhan Gizhduan, Hashim Ali","doi":"10.1109/SmartNets58706.2023.10215937","DOIUrl":null,"url":null,"abstract":"This pilot study presents a web-based real-time speech emotion recognition platform using a convolutional neural network algorithm. The study aims to develop a reliable tool for predicting emotions in speech with a user-friendly design to enable easy access and display of recognition results. The platform recognizes seven emotions (angry, disgust, fear, happy, neutral, sad, and surprise) and has two functionalities: static and real-time speech signals analysis. The static analysis allows users to upload pre-recorded audio files for analysis, while the real-time analysis provides continuous audio processing as it is being recorded. The study also focuses on developing a reliable model with minimal features to predict emotions while accurately identifying various emotions detected in speech. The algorithmic performance of the model was evaluated using publicly available datasets (RAVDESS, TESS, and SAVEE). It achieved an accuracy of 86.46% in static analysis using the selected spectral feature: i.e., MFCC. The performance of the real-time analysis was validated through a user study involving 20 participants. It achieved an accuracy of 65% in recognizing emotions in real-time due to possible known factors. An interesting finding was the discrepancy between how individuals perceived their emotions and those detected by the ML model. The accuracy of the ML model was higher in pre-recorded audio recognition and about the same in real-time recognition compared to previous works. The user-friendly design and CNN algorithm make it a promising solution to address challenges in emotion recognition and highlight the importance of further research in this field.","PeriodicalId":301834,"journal":{"name":"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)","volume":"29 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SmartNets58706.2023.10215937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This pilot study presents a web-based real-time speech emotion recognition platform using a convolutional neural network algorithm. The study aims to develop a reliable tool for predicting emotions in speech with a user-friendly design to enable easy access and display of recognition results. The platform recognizes seven emotions (angry, disgust, fear, happy, neutral, sad, and surprise) and has two functionalities: static and real-time speech signals analysis. The static analysis allows users to upload pre-recorded audio files for analysis, while the real-time analysis provides continuous audio processing as it is being recorded. The study also focuses on developing a reliable model with minimal features to predict emotions while accurately identifying various emotions detected in speech. The algorithmic performance of the model was evaluated using publicly available datasets (RAVDESS, TESS, and SAVEE). It achieved an accuracy of 86.46% in static analysis using the selected spectral feature: i.e., MFCC. The performance of the real-time analysis was validated through a user study involving 20 participants. It achieved an accuracy of 65% in recognizing emotions in real-time due to possible known factors. An interesting finding was the discrepancy between how individuals perceived their emotions and those detected by the ML model. The accuracy of the ML model was higher in pre-recorded audio recognition and about the same in real-time recognition compared to previous works. The user-friendly design and CNN algorithm make it a promising solution to address challenges in emotion recognition and highlight the importance of further research in this field.

查看原文本刊更多论文

基于网络的基于CNN的实时语音情感识别平台

本初步研究提出了一个基于网络的实时语音情感识别平台，使用卷积神经网络算法。该研究旨在开发一种可靠的工具，用于预测语音中的情绪，并具有用户友好的设计，使识别结果易于访问和显示。该平台可以识别七种情绪(愤怒、厌恶、恐惧、快乐、中性、悲伤和惊讶)，并具有静态和实时语音信号分析两种功能。静态分析允许用户上传预先录制的音频文件进行分析，而实时分析则在录制过程中提供连续的音频处理。该研究还致力于开发一个具有最小特征的可靠模型来预测情绪，同时准确识别语音中检测到的各种情绪。使用公开可用的数据集(RAVDESS、TESS和SAVEE)评估模型的算法性能。使用选定的光谱特征(即MFCC)进行静态分析，其精度达到86.46%。通过一项涉及20名参与者的用户研究，验证了实时分析的性能。由于可能的已知因素，它在实时识别情绪方面达到了65%的准确率。一个有趣的发现是，个体感知自己情绪的方式与ML模型检测到的情绪之间存在差异。ML模型在预录音频识别方面的准确率更高，在实时识别方面的准确率与之前的工作基本相同。人性化的设计和CNN算法使其成为解决情感识别挑战的有希望的解决方案，并突出了该领域进一步研究的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 International Conference on Smart Applications, Communications and Networking (SmartNets)

自引率

0.00%

发文量