MuLER: Multiplet-Loss for Emotion Recognition

Anwer Slimi, M. Zrigui, H. Nicolas
{"title":"MuLER: Multiplet-Loss for Emotion Recognition","authors":"Anwer Slimi, M. Zrigui, H. Nicolas","doi":"10.1145/3512527.3531406","DOIUrl":null,"url":null,"abstract":"With the rise of human-machine interactions, it has become necessary for machines to better understand humans in order to respond appropriately. Hence, in order to increase communication and interaction, it would be ideal for machines to automatically detect human emotions. Speech Emotion Recognition (SER) has been a focus of a lot of studies in the past few years. However, they can be considered poor in accuracy and must be improved. In our work, we propose a new loss function that aims to encode speeches instead of classifying them directly as the majority of the existing models do. The encoding will be done in a way that utterances with the same labels would have similar encodings. The encoded speeches were tested on two datasets and we managed to get 88.19% accuracy with the RAVDESS (Ryerson Audiovisual Database of Emotional Speech and Song) dataset and 91.66% accuracy with the RML (Ryerson Multimedia Research Lab) dataset.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3512527.3531406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

With the rise of human-machine interactions, it has become necessary for machines to better understand humans in order to respond appropriately. Hence, in order to increase communication and interaction, it would be ideal for machines to automatically detect human emotions. Speech Emotion Recognition (SER) has been a focus of a lot of studies in the past few years. However, they can be considered poor in accuracy and must be improved. In our work, we propose a new loss function that aims to encode speeches instead of classifying them directly as the majority of the existing models do. The encoding will be done in a way that utterances with the same labels would have similar encodings. The encoded speeches were tested on two datasets and we managed to get 88.19% accuracy with the RAVDESS (Ryerson Audiovisual Database of Emotional Speech and Song) dataset and 91.66% accuracy with the RML (Ryerson Multimedia Research Lab) dataset.
情感识别的多重损失
随着人机交互的兴起,机器有必要更好地了解人类,以便做出适当的反应。因此,为了增加交流和互动,机器能够自动检测人类的情绪将是理想的。语音情感识别(SER)是近年来研究的热点之一。然而,它们的准确性很差,必须加以改进。在我们的工作中,我们提出了一个新的损失函数,旨在对语音进行编码,而不是像大多数现有模型那样直接对语音进行分类。编码将以具有相同标签的话语具有相似编码的方式进行。编码后的语音在两个数据集上进行了测试,我们使用RAVDESS (Ryerson audio - visual Database of Emotional Speech and Song)数据集获得了88.19%的准确率,使用RML (Ryerson Multimedia Research Lab)数据集获得了91.66%的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信