Modelling emotional valence and arousal of non-linguistic utterances for sound design support

A. Khota, E. Cooper, Yu Yan, Máté Kovács
{"title":"Modelling emotional valence and arousal of non-linguistic utterances for sound design support","authors":"A. Khota, E. Cooper, Yu Yan, Máté Kovács","doi":"10.5821/conference-9788419184849.52","DOIUrl":null,"url":null,"abstract":"Non-Linguistic Utterances (NLUs), produced for popular media, computers, robots, and public spaces, can quickly and wordlessly convey emotional characteristics of a message. They have been studied in terms of their ability to convey affect in robot communication. The objective of this research is to develop a model that correctly infers the emotional Valence and Arousal of an NLU. On a Likert scale, 17 subjects evaluated the relative Valence and Arousal of 560 sounds collected from popular movies, TV shows, and video games, including NLUs and other character utterances. Three audio feature sets were used to extract features including spectral energy, spectral spread, zero-crossing rate (ZCR), Mel Frequency Cepstral Coefficients (MFCCs), and audio chroma, as well as pitch, jitter, formant, shimmer, loudness, and Harmonics-to-Noise Ratio, among others. After feature reduction by Factor Analysis, the best-performing models inferred average Valence with a Mean Absolute Error (MAE) of 0.107 and Arousal with MAE of 0.097 on audio samples removed from the training stages. These results suggest the model infers Valence and Arousal of most NLUs to less than the difference between successive rating points on the 7-point Likert scale (0.14). This inference system is applicable to the development of novel NLUs to augment robot-human communication or to the design of sounds for other systems, machines, and settings.","PeriodicalId":433529,"journal":{"name":"9th International Conference on Kansei Engineering and Emotion Research. KEER2022. Proceedings","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"9th International Conference on Kansei Engineering and Emotion Research. KEER2022. Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5821/conference-9788419184849.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Non-Linguistic Utterances (NLUs), produced for popular media, computers, robots, and public spaces, can quickly and wordlessly convey emotional characteristics of a message. They have been studied in terms of their ability to convey affect in robot communication. The objective of this research is to develop a model that correctly infers the emotional Valence and Arousal of an NLU. On a Likert scale, 17 subjects evaluated the relative Valence and Arousal of 560 sounds collected from popular movies, TV shows, and video games, including NLUs and other character utterances. Three audio feature sets were used to extract features including spectral energy, spectral spread, zero-crossing rate (ZCR), Mel Frequency Cepstral Coefficients (MFCCs), and audio chroma, as well as pitch, jitter, formant, shimmer, loudness, and Harmonics-to-Noise Ratio, among others. After feature reduction by Factor Analysis, the best-performing models inferred average Valence with a Mean Absolute Error (MAE) of 0.107 and Arousal with MAE of 0.097 on audio samples removed from the training stages. These results suggest the model infers Valence and Arousal of most NLUs to less than the difference between successive rating points on the 7-point Likert scale (0.14). This inference system is applicable to the development of novel NLUs to augment robot-human communication or to the design of sounds for other systems, machines, and settings.
模拟非语言话语的情感效价和唤醒,为声音设计提供支持
非语言话语(nlu)是为大众媒体、计算机、机器人和公共空间制作的,可以快速、无声地传达信息的情感特征。它们在机器人交流中传递情感的能力已经被研究过了。本研究的目的是建立一个正确推断非语言习得的情绪效价和唤醒的模型。在李克特量表上,17名受试者评估了从流行电影、电视节目和视频游戏中收集的560种声音的相对效价和唤醒,包括NLUs和其他角色的话语。使用三个音频特征集提取特征,包括频谱能量、频谱扩展、过零率(ZCR)、Mel频率倒谱系数(MFCCs)和音频色度,以及音高、抖动、形成峰、闪烁、响度和谐波噪声比等。通过因子分析进行特征缩减后,对于从训练阶段移除的音频样本,表现最好的模型推断出平均价(Valence)和唤醒(Arousal)的平均绝对误差(MAE)分别为0.107和0.097。这些结果表明,该模型对大多数nlu的效价和唤醒的推断小于7分李克特量表上连续评分点之间的差异(0.14)。该推理系统适用于开发新的nlu,以增强机器人与人类的交流,或为其他系统、机器和设置设计声音。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信