{"title":"语音情感识别的联邦参数有效微调","authors":"Haijiao Chen , Huan Zhao , Zixing Zhang , Keqin Li","doi":"10.1016/j.eswa.2025.128154","DOIUrl":null,"url":null,"abstract":"<div><div>Pre-trained speech models leverage large-scale self-supervised learning to create general speech representations, with fine-tuning on specific tasks like Speech Emotion Recognition (SER) significantly enhancing performance. However, fine-tuning on different datasets necessitates storing full copies of model weights, leading to substantial storage demands and deployment challenges, particularly on resource-constrained devices. Centralized training also poses substantial privacy risks due to direct access to raw data. To address these challenges, we propose a cloud-edge-terminal collaborative paradigm for <u>Fed</u>eral <u>L</u>earning <u>P</u>arameter-<u>E</u>fficient <u>F</u>ine-<u>T</u>uning (FedLPEFT), which harnesses the synergy of cloud and edge computing to drive the development of collaborative SER applications. Specifically, the distributed paradigm of Federated Learning (FL) offers a privacy-preserving schema for collaborative training, and fine-tuning based on pre-trained speech models can improve SER performance. Parameter-Efficient Fine-Tuning (PEFT) embeds trainable layers in the feed-forward layers of pre-trained speech models. By freezing backbone parameters and sharing only a small set of trainable parameters, PEFT reduces communication overhead and enables lightweight interactions. Additionally, our experiments on attribute inference attacks across various pre-trained models show that gender prediction is at chance levels, indicating that the FedLPEFT approach significantly mitigates sensitive information leakage, ensuring robust privacy protection.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"287 ","pages":"Article 128154"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Federal parameter-efficient fine-tuning for speech emotion recognition\",\"authors\":\"Haijiao Chen , Huan Zhao , Zixing Zhang , Keqin Li\",\"doi\":\"10.1016/j.eswa.2025.128154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Pre-trained speech models leverage large-scale self-supervised learning to create general speech representations, with fine-tuning on specific tasks like Speech Emotion Recognition (SER) significantly enhancing performance. However, fine-tuning on different datasets necessitates storing full copies of model weights, leading to substantial storage demands and deployment challenges, particularly on resource-constrained devices. Centralized training also poses substantial privacy risks due to direct access to raw data. To address these challenges, we propose a cloud-edge-terminal collaborative paradigm for <u>Fed</u>eral <u>L</u>earning <u>P</u>arameter-<u>E</u>fficient <u>F</u>ine-<u>T</u>uning (FedLPEFT), which harnesses the synergy of cloud and edge computing to drive the development of collaborative SER applications. Specifically, the distributed paradigm of Federated Learning (FL) offers a privacy-preserving schema for collaborative training, and fine-tuning based on pre-trained speech models can improve SER performance. Parameter-Efficient Fine-Tuning (PEFT) embeds trainable layers in the feed-forward layers of pre-trained speech models. By freezing backbone parameters and sharing only a small set of trainable parameters, PEFT reduces communication overhead and enables lightweight interactions. Additionally, our experiments on attribute inference attacks across various pre-trained models show that gender prediction is at chance levels, indicating that the FedLPEFT approach significantly mitigates sensitive information leakage, ensuring robust privacy protection.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"287 \",\"pages\":\"Article 128154\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425017749\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425017749","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Federal parameter-efficient fine-tuning for speech emotion recognition
Pre-trained speech models leverage large-scale self-supervised learning to create general speech representations, with fine-tuning on specific tasks like Speech Emotion Recognition (SER) significantly enhancing performance. However, fine-tuning on different datasets necessitates storing full copies of model weights, leading to substantial storage demands and deployment challenges, particularly on resource-constrained devices. Centralized training also poses substantial privacy risks due to direct access to raw data. To address these challenges, we propose a cloud-edge-terminal collaborative paradigm for Federal Learning Parameter-Efficient Fine-Tuning (FedLPEFT), which harnesses the synergy of cloud and edge computing to drive the development of collaborative SER applications. Specifically, the distributed paradigm of Federated Learning (FL) offers a privacy-preserving schema for collaborative training, and fine-tuning based on pre-trained speech models can improve SER performance. Parameter-Efficient Fine-Tuning (PEFT) embeds trainable layers in the feed-forward layers of pre-trained speech models. By freezing backbone parameters and sharing only a small set of trainable parameters, PEFT reduces communication overhead and enables lightweight interactions. Additionally, our experiments on attribute inference attacks across various pre-trained models show that gender prediction is at chance levels, indicating that the FedLPEFT approach significantly mitigates sensitive information leakage, ensuring robust privacy protection.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.