Modelling emotional valence and arousal of non-linguistic utterances for sound design support

9th International Conference on Kansei Engineering and Emotion Research. KEER2022. Proceedings Pub Date : 2022-09-01 DOI:10.5821/conference-9788419184849.52

A. Khota, E. Cooper, Yu Yan, Máté Kovács

{"title":"Modelling emotional valence and arousal of non-linguistic utterances for sound design support","authors":"A. Khota, E. Cooper, Yu Yan, Máté Kovács","doi":"10.5821/conference-9788419184849.52","DOIUrl":null,"url":null,"abstract":"Non-Linguistic Utterances (NLUs), produced for popular media, computers, robots, and public spaces, can quickly and wordlessly convey emotional characteristics of a message. They have been studied in terms of their ability to convey affect in robot communication. The objective of this research is to develop a model that correctly infers the emotional Valence and Arousal of an NLU. On a Likert scale, 17 subjects evaluated the relative Valence and Arousal of 560 sounds collected from popular movies, TV shows, and video games, including NLUs and other character utterances. Three audio feature sets were used to extract features including spectral energy, spectral spread, zero-crossing rate (ZCR), Mel Frequency Cepstral Coefficients (MFCCs), and audio chroma, as well as pitch, jitter, formant, shimmer, loudness, and Harmonics-to-Noise Ratio, among others. After feature reduction by Factor Analysis, the best-performing models inferred average Valence with a Mean Absolute Error (MAE) of 0.107 and Arousal with MAE of 0.097 on audio samples removed from the training stages. These results suggest the model infers Valence and Arousal of most NLUs to less than the difference between successive rating points on the 7-point Likert scale (0.14). This inference system is applicable to the development of novel NLUs to augment robot-human communication or to the design of sounds for other systems, machines, and settings.","PeriodicalId":433529,"journal":{"name":"9th International Conference on Kansei Engineering and Emotion Research. KEER2022. Proceedings","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"9th International Conference on Kansei Engineering and Emotion Research. KEER2022. Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5821/conference-9788419184849.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Non-Linguistic Utterances (NLUs), produced for popular media, computers, robots, and public spaces, can quickly and wordlessly convey emotional characteristics of a message. They have been studied in terms of their ability to convey affect in robot communication. The objective of this research is to develop a model that correctly infers the emotional Valence and Arousal of an NLU. On a Likert scale, 17 subjects evaluated the relative Valence and Arousal of 560 sounds collected from popular movies, TV shows, and video games, including NLUs and other character utterances. Three audio feature sets were used to extract features including spectral energy, spectral spread, zero-crossing rate (ZCR), Mel Frequency Cepstral Coefficients (MFCCs), and audio chroma, as well as pitch, jitter, formant, shimmer, loudness, and Harmonics-to-Noise Ratio, among others. After feature reduction by Factor Analysis, the best-performing models inferred average Valence with a Mean Absolute Error (MAE) of 0.107 and Arousal with MAE of 0.097 on audio samples removed from the training stages. These results suggest the model infers Valence and Arousal of most NLUs to less than the difference between successive rating points on the 7-point Likert scale (0.14). This inference system is applicable to the development of novel NLUs to augment robot-human communication or to the design of sounds for other systems, machines, and settings.

查看原文本刊更多论文

模拟非语言话语的情感效价和唤醒，为声音设计提供支持

非语言话语(nlu)是为大众媒体、计算机、机器人和公共空间制作的，可以快速、无声地传达信息的情感特征。它们在机器人交流中传递情感的能力已经被研究过了。本研究的目的是建立一个正确推断非语言习得的情绪效价和唤醒的模型。在李克特量表上，17名受试者评估了从流行电影、电视节目和视频游戏中收集的560种声音的相对效价和唤醒，包括NLUs和其他角色的话语。使用三个音频特征集提取特征，包括频谱能量、频谱扩展、过零率(ZCR)、Mel频率倒谱系数(MFCCs)和音频色度，以及音高、抖动、形成峰、闪烁、响度和谐波噪声比等。通过因子分析进行特征缩减后，对于从训练阶段移除的音频样本，表现最好的模型推断出平均价(Valence)和唤醒(Arousal)的平均绝对误差(MAE)分别为0.107和0.097。这些结果表明，该模型对大多数nlu的效价和唤醒的推断小于7分李克特量表上连续评分点之间的差异(0.14)。该推理系统适用于开发新的nlu，以增强机器人与人类的交流，或为其他系统、机器和设置设计声音。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

9th International Conference on Kansei Engineering and Emotion Research. KEER2022. Proceedings

自引率

0.00%

发文量