FAA-CLIP: CLIP的联邦对抗适应

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Internet of Things Journal Pub Date : 2025-02-25 DOI:10.1109/JIOT.2025.3545574

Yihang Wu;Ahmad Chaddad;Christian Desrosiers;Tareef Daqqaq;Reem Kateb

{"title":"FAA-CLIP: CLIP的联邦对抗适应","authors":"Yihang Wu;Ahmad Chaddad;Christian Desrosiers;Tareef Daqqaq;Reem Kateb","doi":"10.1109/JIOT.2025.3545574","DOIUrl":null,"url":null,"abstract":"Despite the remarkable performance of vision language models (VLMs), such as contrastive language image pretraining (CLIP), the large size of these models is a considerable obstacle to their use in federated learning (FL) systems where the parameters of local client models need to be transferred to a global server for aggregation. Another challenge in FL is the heterogeneity of data from different clients, which affects the generalization performance of the solution. In addition, natural pretrained VLMs exhibit poor generalization ability in the medical datasets, suggests there exists a domain gap. To solve these issues, we introduce a novel method for the federated adversarial adaptation (FAA) of CLIP. Our method, named FAA-CLIP, handles the large communication costs of CLIP using a lightweight feature adaptation module (FAM) for aggregation, effectively adapting this VLM to each client’s data while greatly reducing the number of parameters to transfer. By keeping CLIP frozen and only updating the FAM parameters, our method is also computationally efficient. Unlike existing approaches, our FAA-CLIP method directly addresses the problem of domain shifts across clients via a domain adaptation (DA) module. This module employs a domain classifier to predict if a given sample is from the local client or the global server, allowing the model to learn domain-invariant representations. Extensive experiments on six different datasets containing both natural and medical images demonstrate that FAA-CLIP can generalize well on both natural and medical datasets compared to recent FL approaches. Our codes are available at <uri>https://github.com/AIPMLab/FAA-CLIP</uri>.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 12","pages":"21091-21102"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FAA-CLIP: Federated Adversarial Adaptation of CLIP\",\"authors\":\"Yihang Wu;Ahmad Chaddad;Christian Desrosiers;Tareef Daqqaq;Reem Kateb\",\"doi\":\"10.1109/JIOT.2025.3545574\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the remarkable performance of vision language models (VLMs), such as contrastive language image pretraining (CLIP), the large size of these models is a considerable obstacle to their use in federated learning (FL) systems where the parameters of local client models need to be transferred to a global server for aggregation. Another challenge in FL is the heterogeneity of data from different clients, which affects the generalization performance of the solution. In addition, natural pretrained VLMs exhibit poor generalization ability in the medical datasets, suggests there exists a domain gap. To solve these issues, we introduce a novel method for the federated adversarial adaptation (FAA) of CLIP. Our method, named FAA-CLIP, handles the large communication costs of CLIP using a lightweight feature adaptation module (FAM) for aggregation, effectively adapting this VLM to each client’s data while greatly reducing the number of parameters to transfer. By keeping CLIP frozen and only updating the FAM parameters, our method is also computationally efficient. Unlike existing approaches, our FAA-CLIP method directly addresses the problem of domain shifts across clients via a domain adaptation (DA) module. This module employs a domain classifier to predict if a given sample is from the local client or the global server, allowing the model to learn domain-invariant representations. Extensive experiments on six different datasets containing both natural and medical images demonstrate that FAA-CLIP can generalize well on both natural and medical datasets compared to recent FL approaches. Our codes are available at <uri>https://github.com/AIPMLab/FAA-CLIP</uri>.\",\"PeriodicalId\":54347,\"journal\":{\"name\":\"IEEE Internet of Things Journal\",\"volume\":\"12 12\",\"pages\":\"21091-21102\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Internet of Things Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10902405/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10902405/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

尽管视觉语言模型（VLMs）（如对比语言图像预训练（CLIP））具有显著的性能，但这些模型的大尺寸是它们在联邦学习（FL）系统中使用的一个相当大的障碍，在联邦学习（FL）系统中，需要将本地客户端模型的参数传输到全局服务器进行聚合。FL中的另一个挑战是来自不同客户端的数据的异构性，这会影响解决方案的泛化性能。此外，自然预训练的vlm在医疗数据集上泛化能力较差，表明存在领域差距。为了解决这些问题，我们提出了一种新的CLIP联合对抗自适应（FAA）方法。我们的方法，称为FAA-CLIP，使用轻量级的特征适应模块（FAM）进行聚合来处理CLIP的大量通信成本，有效地使该VLM适应每个客户端的数据，同时大大减少了要传输的参数数量。通过保持CLIP冻结并且只更新FAM参数，我们的方法在计算上也是高效的。与现有方法不同，我们的FAA-CLIP方法通过域适应（DA）模块直接解决客户端之间的域转移问题。该模块使用域分类器来预测给定的样本是来自本地客户端还是全局服务器，从而允许模型学习域不变表示。在包含自然和医学图像的六个不同数据集上进行的大量实验表明，与最近的FL方法相比，FAA-CLIP可以在自然和医学数据集上进行很好的泛化。我们的代码可在https://github.com/AIPMLab/FAA-CLIP上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FAA-CLIP: Federated Adversarial Adaptation of CLIP

Despite the remarkable performance of vision language models (VLMs), such as contrastive language image pretraining (CLIP), the large size of these models is a considerable obstacle to their use in federated learning (FL) systems where the parameters of local client models need to be transferred to a global server for aggregation. Another challenge in FL is the heterogeneity of data from different clients, which affects the generalization performance of the solution. In addition, natural pretrained VLMs exhibit poor generalization ability in the medical datasets, suggests there exists a domain gap. To solve these issues, we introduce a novel method for the federated adversarial adaptation (FAA) of CLIP. Our method, named FAA-CLIP, handles the large communication costs of CLIP using a lightweight feature adaptation module (FAM) for aggregation, effectively adapting this VLM to each client’s data while greatly reducing the number of parameters to transfer. By keeping CLIP frozen and only updating the FAM parameters, our method is also computationally efficient. Unlike existing approaches, our FAA-CLIP method directly addresses the problem of domain shifts across clients via a domain adaptation (DA) module. This module employs a domain classifier to predict if a given sample is from the local client or the global server, allowing the model to learn domain-invariant representations. Extensive experiments on six different datasets containing both natural and medical images demonstrate that FAA-CLIP can generalize well on both natural and medical datasets compared to recent FL approaches. Our codes are available at https://github.com/AIPMLab/FAA-CLIP.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Internet of Things Journal Computer Science-Information Systems

CiteScore

17.60

自引率

13.20%

发文量

1982

期刊介绍： The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.