Ruobing Li , Yifan Feng , Lin Shen , Liuxian Ma , Haojie Zhang , Kun Qian , Bin Hu , Yoshiharu Yamamoto , Björn W. Schuller
{"title":"fedvcpll - diff:一个带扩散模型的语音情感识别联邦卷积原型学习框架","authors":"Ruobing Li , Yifan Feng , Lin Shen , Liuxian Ma , Haojie Zhang , Kun Qian , Bin Hu , Yoshiharu Yamamoto , Björn W. Schuller","doi":"10.1016/j.inffus.2025.103745","DOIUrl":null,"url":null,"abstract":"<div><div>Speech Emotion Recognition (SER), a key emotion analysis technology, has shown significant value in various research areas. Previous SER models have achieved good emotion recognition accuracy, but typical centrally-based training requires centralised processing of speech data, which has a serious risk of privacy leakage. Federated learning (FL) can avoid centralised data processing through distributed learning, providing a solution for privacy protection in SER. However, FL faces several challenges in practical applications, including imbalanced data distribution and inconsistent labelling. Furthermore, typical FL frameworks focus on client-side enhancement and ignore server-side aggregation strategy optimisation, which can increase the computational load on the client side. To address the aforementioned problems, we propose a novel approach, FedVCPL-Diff. Firstly, regarding information fusion, we introduce a diffusion model on the server side to generate Valence-Arousal-Dominance emotion space features, which replaces the typical aggregation framework and effectively promotes global information fusion. In addition, in terms of information exchange, we propose a lightweight and personalised FL transmission framework based on the exchange of VAD features. FedVCPL-Diff optimises the local model by updating the data distribution anchors, which not only avoids the privacy risk but also reduces the communication cost. Experimental results show that the framework significantly improves emotion recognition performance compared to four commonly used FL frameworks. The overall performance of our framework also shows a significant advantage compared to locally independent models.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103745"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FedVCPL-Diff: A federated convolutional prototype learning framework with a diffusion model for speech emotion recognition\",\"authors\":\"Ruobing Li , Yifan Feng , Lin Shen , Liuxian Ma , Haojie Zhang , Kun Qian , Bin Hu , Yoshiharu Yamamoto , Björn W. Schuller\",\"doi\":\"10.1016/j.inffus.2025.103745\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Speech Emotion Recognition (SER), a key emotion analysis technology, has shown significant value in various research areas. Previous SER models have achieved good emotion recognition accuracy, but typical centrally-based training requires centralised processing of speech data, which has a serious risk of privacy leakage. Federated learning (FL) can avoid centralised data processing through distributed learning, providing a solution for privacy protection in SER. However, FL faces several challenges in practical applications, including imbalanced data distribution and inconsistent labelling. Furthermore, typical FL frameworks focus on client-side enhancement and ignore server-side aggregation strategy optimisation, which can increase the computational load on the client side. To address the aforementioned problems, we propose a novel approach, FedVCPL-Diff. Firstly, regarding information fusion, we introduce a diffusion model on the server side to generate Valence-Arousal-Dominance emotion space features, which replaces the typical aggregation framework and effectively promotes global information fusion. In addition, in terms of information exchange, we propose a lightweight and personalised FL transmission framework based on the exchange of VAD features. FedVCPL-Diff optimises the local model by updating the data distribution anchors, which not only avoids the privacy risk but also reduces the communication cost. Experimental results show that the framework significantly improves emotion recognition performance compared to four commonly used FL frameworks. The overall performance of our framework also shows a significant advantage compared to locally independent models.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"127 \",\"pages\":\"Article 103745\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525008073\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008073","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
FedVCPL-Diff: A federated convolutional prototype learning framework with a diffusion model for speech emotion recognition
Speech Emotion Recognition (SER), a key emotion analysis technology, has shown significant value in various research areas. Previous SER models have achieved good emotion recognition accuracy, but typical centrally-based training requires centralised processing of speech data, which has a serious risk of privacy leakage. Federated learning (FL) can avoid centralised data processing through distributed learning, providing a solution for privacy protection in SER. However, FL faces several challenges in practical applications, including imbalanced data distribution and inconsistent labelling. Furthermore, typical FL frameworks focus on client-side enhancement and ignore server-side aggregation strategy optimisation, which can increase the computational load on the client side. To address the aforementioned problems, we propose a novel approach, FedVCPL-Diff. Firstly, regarding information fusion, we introduce a diffusion model on the server side to generate Valence-Arousal-Dominance emotion space features, which replaces the typical aggregation framework and effectively promotes global information fusion. In addition, in terms of information exchange, we propose a lightweight and personalised FL transmission framework based on the exchange of VAD features. FedVCPL-Diff optimises the local model by updating the data distribution anchors, which not only avoids the privacy risk but also reduces the communication cost. Experimental results show that the framework significantly improves emotion recognition performance compared to four commonly used FL frameworks. The overall performance of our framework also shows a significant advantage compared to locally independent models.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.