Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance

Huang-Cheng Chou, Haibin Wu, Chi-Chun Lee
{"title":"Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance","authors":"Huang-Cheng Chou, Haibin Wu, Chi-Chun Lee","doi":"arxiv-2409.10762","DOIUrl":null,"url":null,"abstract":"Speech Emotion Recognition (SER) systems rely on speech input and emotional\nlabels annotated by humans. However, various emotion databases collect\nperceptional evaluations in different ways. For instance, the IEMOCAP dataset\nuses video clips with sounds for annotators to provide their emotional\nperceptions. However, the most significant English emotion dataset, the\nMSP-PODCAST, only provides speech for raters to choose the emotional ratings.\nNevertheless, using speech as input is the standard approach to training SER\nsystems. Therefore, the open question is the emotional labels elicited by which\nscenarios are the most effective for training SER systems. We comprehensively\ncompare the effectiveness of SER systems trained with labels elicited by\ndifferent modality stimuli and evaluate the SER systems on various testing\nconditions. Also, we introduce an all-inclusive label that combines all labels\nelicited by various modalities. We show that using labels elicited by\nvoice-only stimuli for training yields better performance on the test set,\nwhereas labels elicited by voice-only stimuli.","PeriodicalId":501034,"journal":{"name":"arXiv - EE - Signal Processing","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Speech Emotion Recognition (SER) systems rely on speech input and emotional labels annotated by humans. However, various emotion databases collect perceptional evaluations in different ways. For instance, the IEMOCAP dataset uses video clips with sounds for annotators to provide their emotional perceptions. However, the most significant English emotion dataset, the MSP-PODCAST, only provides speech for raters to choose the emotional ratings. Nevertheless, using speech as input is the standard approach to training SER systems. Therefore, the open question is the emotional labels elicited by which scenarios are the most effective for training SER systems. We comprehensively compare the effectiveness of SER systems trained with labels elicited by different modality stimuli and evaluate the SER systems on various testing conditions. Also, we introduce an all-inclusive label that combines all labels elicited by various modalities. We show that using labels elicited by voice-only stimuli for training yields better performance on the test set, whereas labels elicited by voice-only stimuli.
刺激模式很重要:不同模式的感知评估对语音情感识别系统性能的影响
语音情感识别(SER)系统依赖于语音输入和人类注释的情感标签。然而,各种情感数据库收集感知评估的方式各不相同。例如,IEMOCAP 数据集利用带有声音的视频片段让注释者提供他们的情感感知。然而,最重要的英语情感数据集--MSP-PODCAST--只提供语音供评分者选择情感评分。然而,使用语音作为输入是训练 SER 系统的标准方法。然而,使用语音作为输入是训练 SER 系统的标准方法。因此,一个悬而未决的问题是,哪种情景下激发的情感标签对训练 SER 系统最有效。我们全面比较了使用不同模式刺激激发的标签训练 SER 系统的效果,并在各种测试条件下对 SER 系统进行了评估。此外,我们还引入了一种包罗万象的标签,它结合了各种模态激发的所有标签。我们的研究表明,使用纯声音刺激激发的标签进行训练,在测试集上的表现比使用纯声音刺激激发的标签更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信