Xingcan Liang, Linsen Xu, Zhipeng Liu, Xiang Sui, Jinfu Liu
{"title":"语音情感识别中的噪声标签抑制模块","authors":"Xingcan Liang, Linsen Xu, Zhipeng Liu, Xiang Sui, Jinfu Liu","doi":"10.1145/3598151.3598176","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition (SER) has become an attractive topic owing to its broad range of applications. Segmentation is often used to increase training data for SER, but the inherited label may result in low performance. In this paper, we proposed a robust noise-label-suppressed module by relabeling the segment label to suppress the bad effects of the inherited label. Firstly, the segment of the log Mel spectrogram with deltas and delta-deltas of speech was calculated. Then, speech features were extracted by feature extraction model with 3-D data. Finally, the labels of each segment were corrected by the relabel model. Experimental results on the IEMOCAP dataset illustrate that our proposed noise-label suppressed module is superior to other advanced methods and gets robust performance.","PeriodicalId":398644,"journal":{"name":"Proceedings of the 2023 3rd International Conference on Robotics and Control Engineering","volume":"61 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Noise-label Suppressed Module for Speech Emotion Recognition\",\"authors\":\"Xingcan Liang, Linsen Xu, Zhipeng Liu, Xiang Sui, Jinfu Liu\",\"doi\":\"10.1145/3598151.3598176\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech emotion recognition (SER) has become an attractive topic owing to its broad range of applications. Segmentation is often used to increase training data for SER, but the inherited label may result in low performance. In this paper, we proposed a robust noise-label-suppressed module by relabeling the segment label to suppress the bad effects of the inherited label. Firstly, the segment of the log Mel spectrogram with deltas and delta-deltas of speech was calculated. Then, speech features were extracted by feature extraction model with 3-D data. Finally, the labels of each segment were corrected by the relabel model. Experimental results on the IEMOCAP dataset illustrate that our proposed noise-label suppressed module is superior to other advanced methods and gets robust performance.\",\"PeriodicalId\":398644,\"journal\":{\"name\":\"Proceedings of the 2023 3rd International Conference on Robotics and Control Engineering\",\"volume\":\"61 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 3rd International Conference on Robotics and Control Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3598151.3598176\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 3rd International Conference on Robotics and Control Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3598151.3598176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Noise-label Suppressed Module for Speech Emotion Recognition
Speech emotion recognition (SER) has become an attractive topic owing to its broad range of applications. Segmentation is often used to increase training data for SER, but the inherited label may result in low performance. In this paper, we proposed a robust noise-label-suppressed module by relabeling the segment label to suppress the bad effects of the inherited label. Firstly, the segment of the log Mel spectrogram with deltas and delta-deltas of speech was calculated. Then, speech features were extracted by feature extraction model with 3-D data. Finally, the labels of each segment were corrected by the relabel model. Experimental results on the IEMOCAP dataset illustrate that our proposed noise-label suppressed module is superior to other advanced methods and gets robust performance.