{"title":"Breaking the Barrier with a Multi-Domain SER","authors":"Jamalia Sultana, Mahmuda Naznin","doi":"10.1109/COMPSAC54236.2022.00088","DOIUrl":null,"url":null,"abstract":"Voice based interactive system has numerous ap-plications including patient care system, robotics, interactive learning tool etc. Speech Emotion Recognition (SER) is a vital part of any voice based interactive system. Providing an efficient SER framework in multi-lingual domain is highly challenging due to the difficulties in feature extraction from noisy voice signals, language barrier, issues due to gender dependency, domain generalization problem etc. Therefore, all of these challenges have made multi-domain SER interesting to the researchers. In our study, we provide a multi-domain SER framework where popular benchmark corpora have been integrated and used together for training and testing with the goal of removing language barriers and the corpus dependency. Moreover, we have utilized the role of gender on acoustic signal features to improve the performance in multi-domain. We design a hierarchical Convolutional Neural Network (CNN) based framework that finds the influence of genders while recognizing emotions in multi-domain cross-corpus system. We have used Unweighted Average Recall (UAR) for measuring performance in the multi-domain corpus to address data imbalance problem. We validate our proposed framework by conducting extensive experiments with benchmark datasets. The results show that using the proposed gender-based SER model with multi-lingual cross-corpus performs better than the conventional SER models. Our novel multi-domain cross-corpus SER will be very helpful for designing different multi-lingual voice- based interactive applications.","PeriodicalId":330838,"journal":{"name":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC54236.2022.00088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Voice based interactive system has numerous ap-plications including patient care system, robotics, interactive learning tool etc. Speech Emotion Recognition (SER) is a vital part of any voice based interactive system. Providing an efficient SER framework in multi-lingual domain is highly challenging due to the difficulties in feature extraction from noisy voice signals, language barrier, issues due to gender dependency, domain generalization problem etc. Therefore, all of these challenges have made multi-domain SER interesting to the researchers. In our study, we provide a multi-domain SER framework where popular benchmark corpora have been integrated and used together for training and testing with the goal of removing language barriers and the corpus dependency. Moreover, we have utilized the role of gender on acoustic signal features to improve the performance in multi-domain. We design a hierarchical Convolutional Neural Network (CNN) based framework that finds the influence of genders while recognizing emotions in multi-domain cross-corpus system. We have used Unweighted Average Recall (UAR) for measuring performance in the multi-domain corpus to address data imbalance problem. We validate our proposed framework by conducting extensive experiments with benchmark datasets. The results show that using the proposed gender-based SER model with multi-lingual cross-corpus performs better than the conventional SER models. Our novel multi-domain cross-corpus SER will be very helpful for designing different multi-lingual voice- based interactive applications.