ModiFedCat: A multi-modal distillation based federated catalytic framework

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-06-13 DOI:10.1016/j.inffus.2025.103378

Dongdong Li, Zhenqiang Weng, Zhengji Xuan, Zhe Wang

{"title":"ModiFedCat: A multi-modal distillation based federated catalytic framework","authors":"Dongdong Li, Zhenqiang Weng, Zhengji Xuan, Zhe Wang","doi":"10.1016/j.inffus.2025.103378","DOIUrl":null,"url":null,"abstract":"<div><div>The integration of multi-modal data in federated learning systems faces significant challenges in balancing privacy preservation with effective cross-modal correlation learning under strict client isolation constraints. We propose ModiFedCat, a novel curriculum-guided multi-modal federated distillation framework that combines hierarchical knowledge transfer with adaptive training scheduling to enhance client-side model performance while maintaining rigorous data privacy. Our method computes multi-modal knowledge distillation losses at both the feature extraction and output layers, ensuring that local models are consistently aligned with the global model throughout training. Additionally, we introduce a unique catalyst strategy that dynamically schedules the integration of the distillation loss. By initially training the global model without distillation, we determine the optimal timing for its introduction, thereby maximizing the effectiveness of knowledge transfer once local models have stabilized. Experimental results on three benchmark datasets, AV-MNIST, MM-IMDB, and MIMIC III, demonstrate that ModiFedCat outperforms existing multi-modal federated learning methods. The proposed framework significantly improves the fusion capability of multi-modal models while maintaining client data privacy. This approach balances local adaptation and global knowledge integration, making it a robust solution for multi-modal federated learning scenarios.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103378"},"PeriodicalIF":14.7000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004518","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The integration of multi-modal data in federated learning systems faces significant challenges in balancing privacy preservation with effective cross-modal correlation learning under strict client isolation constraints. We propose ModiFedCat, a novel curriculum-guided multi-modal federated distillation framework that combines hierarchical knowledge transfer with adaptive training scheduling to enhance client-side model performance while maintaining rigorous data privacy. Our method computes multi-modal knowledge distillation losses at both the feature extraction and output layers, ensuring that local models are consistently aligned with the global model throughout training. Additionally, we introduce a unique catalyst strategy that dynamically schedules the integration of the distillation loss. By initially training the global model without distillation, we determine the optimal timing for its introduction, thereby maximizing the effectiveness of knowledge transfer once local models have stabilized. Experimental results on three benchmark datasets, AV-MNIST, MM-IMDB, and MIMIC III, demonstrate that ModiFedCat outperforms existing multi-modal federated learning methods. The proposed framework significantly improves the fusion capability of multi-modal models while maintaining client data privacy. This approach balances local adaptation and global knowledge integration, making it a robust solution for multi-modal federated learning scenarios.

查看原文本刊更多论文

ModiFedCat：一个基于多模态蒸馏的联邦催化框架

在严格的客户端隔离约束下，多模态数据在联邦学习系统中的集成面临着平衡隐私保护和有效的跨模态相关学习的重大挑战。我们提出了ModiFedCat，这是一个新的课程引导的多模态联邦蒸馏框架，将分层知识转移与自适应训练调度相结合，以提高客户端模型性能，同时保持严格的数据隐私。我们的方法在特征提取层和输出层计算多模态知识蒸馏损失，确保局部模型在整个训练过程中始终与全局模型保持一致。此外，我们还介绍了一种独特的催化剂策略，可以动态地调度蒸馏损失的集成。通过初始训练全局模型而不进行蒸馏，我们确定了引入全局模型的最佳时机，从而在局部模型稳定后最大化知识转移的有效性。在AV-MNIST、MM-IMDB和MIMIC III三个基准数据集上的实验结果表明，ModiFedCat优于现有的多模态联邦学习方法。该框架在保护客户端数据隐私的同时，显著提高了多模态模型的融合能力。这种方法平衡了局部适应和全局知识集成，使其成为多模态联邦学习场景的健壮解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.