An Emotional Respiration Speech Dataset

Rozemarijn Roes, Francisca Pessanha, Almila Akdag Salah
{"title":"An Emotional Respiration Speech Dataset","authors":"Rozemarijn Roes, Francisca Pessanha, Almila Akdag Salah","doi":"10.1145/3536220.3558803","DOIUrl":null,"url":null,"abstract":"Natural interaction with human-like embodied agents, such as social robots or virtual agents, relies on the generation of realistic non-verbal behaviours, including body language, gaze and facial expressions. Humans can read and interpret somatic social signals, such as blushing or changes in the respiration rate and depth, as part of such non-verbal behaviours. Studies show that realistic breathing changes in an agent improve the communication of emotional cues, but there are scarcely any databases for affect analysis with breathing ground truth to learn how affect and breathing correlate. Emotional speech databases typically contain utterances coloured by emotional intonation, instead of natural conversation, and lack breathing annotations. In this paper, we introduce the Emotional Speech Respiration Dataset, collected from 20 subjects in a spontaneous speech setting where emotions are elicited via music. Four emotion classes (happy, sad, annoying, calm) are elicited, with 20 minutes of data per participant. The breathing ground truth is collected with piezoelectric respiration sensors, and affective labels are collected via self-reported valence and arousal levels. Along with these, we extract and share visual features of the participants (such as facial keypoints, action units, gaze directions), transcriptions of the speech instances, and paralinguistic features. Our analysis shows that the music induced emotions show significant changes in the levels of valence for all four emotions, compared to the baseline. Furthermore, the breathing patterns change with happy music significantly, but the changes in other elicitors are less prominent. We believe this resource can be used with different embodied agents to signal affect via simulated breathing.","PeriodicalId":186796,"journal":{"name":"Companion Publication of the 2022 International Conference on Multimodal Interaction","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2022 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3536220.3558803","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Natural interaction with human-like embodied agents, such as social robots or virtual agents, relies on the generation of realistic non-verbal behaviours, including body language, gaze and facial expressions. Humans can read and interpret somatic social signals, such as blushing or changes in the respiration rate and depth, as part of such non-verbal behaviours. Studies show that realistic breathing changes in an agent improve the communication of emotional cues, but there are scarcely any databases for affect analysis with breathing ground truth to learn how affect and breathing correlate. Emotional speech databases typically contain utterances coloured by emotional intonation, instead of natural conversation, and lack breathing annotations. In this paper, we introduce the Emotional Speech Respiration Dataset, collected from 20 subjects in a spontaneous speech setting where emotions are elicited via music. Four emotion classes (happy, sad, annoying, calm) are elicited, with 20 minutes of data per participant. The breathing ground truth is collected with piezoelectric respiration sensors, and affective labels are collected via self-reported valence and arousal levels. Along with these, we extract and share visual features of the participants (such as facial keypoints, action units, gaze directions), transcriptions of the speech instances, and paralinguistic features. Our analysis shows that the music induced emotions show significant changes in the levels of valence for all four emotions, compared to the baseline. Furthermore, the breathing patterns change with happy music significantly, but the changes in other elicitors are less prominent. We believe this resource can be used with different embodied agents to signal affect via simulated breathing.
情绪呼吸语音数据集
与类人具身代理(如社交机器人或虚拟代理)的自然互动依赖于产生现实的非语言行为,包括肢体语言、凝视和面部表情。人类可以阅读和解释身体社会信号,如脸红或呼吸频率和深度的变化,作为这种非语言行为的一部分。研究表明,代理人的真实呼吸变化可以改善情绪线索的交流,但几乎没有任何数据库可以对呼吸的真实情况进行情绪分析,以了解情绪和呼吸之间的关系。情感语音数据库通常包含由情感语调着色的话语,而不是自然对话,并且缺乏呼吸注释。在本文中,我们介绍了情绪语音呼吸数据集,该数据集收集自20名受试者,他们在自发语音环境中通过音乐激发情绪。四种情绪类别(快乐、悲伤、烦恼、平静)被引出,每个参与者有20分钟的数据。用压电式呼吸传感器收集呼吸的真实情况,并通过自我报告的价和唤醒水平收集情感标签。与此同时,我们提取并共享参与者的视觉特征(如面部关键点、动作单元、凝视方向)、语音实例的转录和副语言特征。我们的分析表明,与基线相比,音乐诱发的情绪在所有四种情绪的效价水平上都表现出显著的变化。此外,呼吸模式随着欢快的音乐而显著改变,而其他刺激物的变化则不那么明显。我们相信这种资源可以与不同的具身代理一起使用,通过模拟呼吸来发出信号。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信