Breaking the Barrier with a Multi-Domain SER

Jamalia Sultana, Mahmuda Naznin
{"title":"Breaking the Barrier with a Multi-Domain SER","authors":"Jamalia Sultana, Mahmuda Naznin","doi":"10.1109/COMPSAC54236.2022.00088","DOIUrl":null,"url":null,"abstract":"Voice based interactive system has numerous ap-plications including patient care system, robotics, interactive learning tool etc. Speech Emotion Recognition (SER) is a vital part of any voice based interactive system. Providing an efficient SER framework in multi-lingual domain is highly challenging due to the difficulties in feature extraction from noisy voice signals, language barrier, issues due to gender dependency, domain generalization problem etc. Therefore, all of these challenges have made multi-domain SER interesting to the researchers. In our study, we provide a multi-domain SER framework where popular benchmark corpora have been integrated and used together for training and testing with the goal of removing language barriers and the corpus dependency. Moreover, we have utilized the role of gender on acoustic signal features to improve the performance in multi-domain. We design a hierarchical Convolutional Neural Network (CNN) based framework that finds the influence of genders while recognizing emotions in multi-domain cross-corpus system. We have used Unweighted Average Recall (UAR) for measuring performance in the multi-domain corpus to address data imbalance problem. We validate our proposed framework by conducting extensive experiments with benchmark datasets. The results show that using the proposed gender-based SER model with multi-lingual cross-corpus performs better than the conventional SER models. Our novel multi-domain cross-corpus SER will be very helpful for designing different multi-lingual voice- based interactive applications.","PeriodicalId":330838,"journal":{"name":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC54236.2022.00088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Voice based interactive system has numerous ap-plications including patient care system, robotics, interactive learning tool etc. Speech Emotion Recognition (SER) is a vital part of any voice based interactive system. Providing an efficient SER framework in multi-lingual domain is highly challenging due to the difficulties in feature extraction from noisy voice signals, language barrier, issues due to gender dependency, domain generalization problem etc. Therefore, all of these challenges have made multi-domain SER interesting to the researchers. In our study, we provide a multi-domain SER framework where popular benchmark corpora have been integrated and used together for training and testing with the goal of removing language barriers and the corpus dependency. Moreover, we have utilized the role of gender on acoustic signal features to improve the performance in multi-domain. We design a hierarchical Convolutional Neural Network (CNN) based framework that finds the influence of genders while recognizing emotions in multi-domain cross-corpus system. We have used Unweighted Average Recall (UAR) for measuring performance in the multi-domain corpus to address data imbalance problem. We validate our proposed framework by conducting extensive experiments with benchmark datasets. The results show that using the proposed gender-based SER model with multi-lingual cross-corpus performs better than the conventional SER models. Our novel multi-domain cross-corpus SER will be very helpful for designing different multi-lingual voice- based interactive applications.
用多域SER打破障碍
基于语音的交互系统有许多应用,包括病人护理系统、机器人、交互式学习工具等。语音情感识别(SER)是基于语音的交互系统的重要组成部分。在多语言领域提供一个高效的SER框架是一个非常具有挑战性的问题,这主要是由于噪声语音信号的特征提取困难、语言障碍、性别依赖问题、领域泛化问题等。因此,所有这些挑战都引起了研究人员的兴趣。在我们的研究中,我们提供了一个多领域的SER框架,其中流行的基准语料库已经集成并一起用于训练和测试,目的是消除语言障碍和语料库依赖。此外,我们利用性别对声信号特征的作用来提高多域性能。我们设计了一个基于分层卷积神经网络(CNN)的框架,在多领域跨语料库系统中发现性别对情绪识别的影响。为了解决数据不平衡问题,我们使用了未加权平均召回率(UAR)来衡量多领域语料库的性能。我们验证拟议的框架进行广泛的基准数据集实验。结果表明,基于性别的多语言跨语料库SER模型的性能优于传统的SER模型。我们的多领域跨语料库SER将有助于设计不同的基于多语言语音的交互应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信