语音情感识别的联邦参数有效微调

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-05-15 DOI:10.1016/j.eswa.2025.128154

Haijiao Chen , Huan Zhao , Zixing Zhang , Keqin Li

{"title":"语音情感识别的联邦参数有效微调","authors":"Haijiao Chen , Huan Zhao , Zixing Zhang , Keqin Li","doi":"10.1016/j.eswa.2025.128154","DOIUrl":null,"url":null,"abstract":"<div><div>Pre-trained speech models leverage large-scale self-supervised learning to create general speech representations, with fine-tuning on specific tasks like Speech Emotion Recognition (SER) significantly enhancing performance. However, fine-tuning on different datasets necessitates storing full copies of model weights, leading to substantial storage demands and deployment challenges, particularly on resource-constrained devices. Centralized training also poses substantial privacy risks due to direct access to raw data. To address these challenges, we propose a cloud-edge-terminal collaborative paradigm for <u>Fed</u>eral <u>L</u>earning <u>P</u>arameter-<u>E</u>fficient <u>F</u>ine-<u>T</u>uning (FedLPEFT), which harnesses the synergy of cloud and edge computing to drive the development of collaborative SER applications. Specifically, the distributed paradigm of Federated Learning (FL) offers a privacy-preserving schema for collaborative training, and fine-tuning based on pre-trained speech models can improve SER performance. Parameter-Efficient Fine-Tuning (PEFT) embeds trainable layers in the feed-forward layers of pre-trained speech models. By freezing backbone parameters and sharing only a small set of trainable parameters, PEFT reduces communication overhead and enables lightweight interactions. Additionally, our experiments on attribute inference attacks across various pre-trained models show that gender prediction is at chance levels, indicating that the FedLPEFT approach significantly mitigates sensitive information leakage, ensuring robust privacy protection.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"287 ","pages":"Article 128154"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Federal parameter-efficient fine-tuning for speech emotion recognition\",\"authors\":\"Haijiao Chen , Huan Zhao , Zixing Zhang , Keqin Li\",\"doi\":\"10.1016/j.eswa.2025.128154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Pre-trained speech models leverage large-scale self-supervised learning to create general speech representations, with fine-tuning on specific tasks like Speech Emotion Recognition (SER) significantly enhancing performance. However, fine-tuning on different datasets necessitates storing full copies of model weights, leading to substantial storage demands and deployment challenges, particularly on resource-constrained devices. Centralized training also poses substantial privacy risks due to direct access to raw data. To address these challenges, we propose a cloud-edge-terminal collaborative paradigm for <u>Fed</u>eral <u>L</u>earning <u>P</u>arameter-<u>E</u>fficient <u>F</u>ine-<u>T</u>uning (FedLPEFT), which harnesses the synergy of cloud and edge computing to drive the development of collaborative SER applications. Specifically, the distributed paradigm of Federated Learning (FL) offers a privacy-preserving schema for collaborative training, and fine-tuning based on pre-trained speech models can improve SER performance. Parameter-Efficient Fine-Tuning (PEFT) embeds trainable layers in the feed-forward layers of pre-trained speech models. By freezing backbone parameters and sharing only a small set of trainable parameters, PEFT reduces communication overhead and enables lightweight interactions. Additionally, our experiments on attribute inference attacks across various pre-trained models show that gender prediction is at chance levels, indicating that the FedLPEFT approach significantly mitigates sensitive information leakage, ensuring robust privacy protection.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"287 \",\"pages\":\"Article 128154\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425017749\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425017749","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

预训练的语音模型利用大规模的自我监督学习来创建通用的语音表示，并对语音情感识别（SER）等特定任务进行微调，显著提高了性能。然而，在不同的数据集上进行微调需要存储模型权重的完整副本，这会导致大量的存储需求和部署挑战，特别是在资源受限的设备上。由于直接访问原始数据，集中式培训也带来了巨大的隐私风险。为了应对这些挑战，我们提出了一种用于联邦学习参数高效微调（FedLPEFT）的云-边缘终端协作范式，该范式利用云和边缘计算的协同作用来推动协作SER应用程序的开发。具体来说，联邦学习（FL）的分布式范例为协作训练提供了一种隐私保护模式，并且基于预训练语音模型的微调可以提高SER性能。参数有效微调（PEFT）将可训练层嵌入到预训练语音模型的前馈层中。通过冻结骨干参数并只共享一小部分可训练参数，PEFT减少了通信开销并支持轻量级交互。此外，我们对各种预训练模型的属性推理攻击的实验表明，性别预测是偶然的，这表明FedLPEFT方法显著减轻了敏感信息泄露，确保了强大的隐私保护。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Federal parameter-efficient fine-tuning for speech emotion recognition

Pre-trained speech models leverage large-scale self-supervised learning to create general speech representations, with fine-tuning on specific tasks like Speech Emotion Recognition (SER) significantly enhancing performance. However, fine-tuning on different datasets necessitates storing full copies of model weights, leading to substantial storage demands and deployment challenges, particularly on resource-constrained devices. Centralized training also poses substantial privacy risks due to direct access to raw data. To address these challenges, we propose a cloud-edge-terminal collaborative paradigm for Federal Learning Parameter-Efficient Fine-Tuning (FedLPEFT), which harnesses the synergy of cloud and edge computing to drive the development of collaborative SER applications. Specifically, the distributed paradigm of Federated Learning (FL) offers a privacy-preserving schema for collaborative training, and fine-tuning based on pre-trained speech models can improve SER performance. Parameter-Efficient Fine-Tuning (PEFT) embeds trainable layers in the feed-forward layers of pre-trained speech models. By freezing backbone parameters and sharing only a small set of trainable parameters, PEFT reduces communication overhead and enables lightweight interactions. Additionally, our experiments on attribute inference attacks across various pre-trained models show that gender prediction is at chance levels, indicating that the FedLPEFT approach significantly mitigates sensitive information leakage, ensuring robust privacy protection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.