{"title":"Multiroom Speech Emotion Recognition","authors":"Erez Shalev, I. Cohen","doi":"10.23919/eusipco55093.2022.9909798","DOIUrl":null,"url":null,"abstract":"Automated audio systems, such as speech emotion recognition, can benefit from the ability to work from another room. No research has yet been conducted on the effectiveness of such systems when the sound source originates in a different room than the target system, and the sound has to travel between the rooms through the wall. New advancements in room-impulse-response generators enable a large-scale simulation of audio sources from adjacent rooms and integration into a training dataset. Such a capability improves the performance of datadriven methods such as deep learning. This paper presents the first evaluation of multiroom speech emotion recognition systems. The isolating policies due to COVID-19 presented many cases of isolated individuals suffering emotional difficulties, where such capabilities would be very beneficial. We perform training, with and without an audio simulation generator, and compare the results of three different models on real data recorded in a real multiroom audio scene. We show that models trained without the new generator achieve poor results when presented with multiroom data. We proceed to show that augmentation using the new generator improves the performances for all three models. Our results demonstrate the advantage of using such a generator. Furthermore, testing with two different deep learning architectures shows that the generator improves the results independently of the given architecture.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Automated audio systems, such as speech emotion recognition, can benefit from the ability to work from another room. No research has yet been conducted on the effectiveness of such systems when the sound source originates in a different room than the target system, and the sound has to travel between the rooms through the wall. New advancements in room-impulse-response generators enable a large-scale simulation of audio sources from adjacent rooms and integration into a training dataset. Such a capability improves the performance of datadriven methods such as deep learning. This paper presents the first evaluation of multiroom speech emotion recognition systems. The isolating policies due to COVID-19 presented many cases of isolated individuals suffering emotional difficulties, where such capabilities would be very beneficial. We perform training, with and without an audio simulation generator, and compare the results of three different models on real data recorded in a real multiroom audio scene. We show that models trained without the new generator achieve poor results when presented with multiroom data. We proceed to show that augmentation using the new generator improves the performances for all three models. Our results demonstrate the advantage of using such a generator. Furthermore, testing with two different deep learning architectures shows that the generator improves the results independently of the given architecture.