Dongdong Li, Zhenqiang Weng, Zhengji Xuan, Zhe Wang
{"title":"ModiFedCat: A multi-modal distillation based federated catalytic framework","authors":"Dongdong Li, Zhenqiang Weng, Zhengji Xuan, Zhe Wang","doi":"10.1016/j.inffus.2025.103378","DOIUrl":null,"url":null,"abstract":"<div><div>The integration of multi-modal data in federated learning systems faces significant challenges in balancing privacy preservation with effective cross-modal correlation learning under strict client isolation constraints. We propose ModiFedCat, a novel curriculum-guided multi-modal federated distillation framework that combines hierarchical knowledge transfer with adaptive training scheduling to enhance client-side model performance while maintaining rigorous data privacy. Our method computes multi-modal knowledge distillation losses at both the feature extraction and output layers, ensuring that local models are consistently aligned with the global model throughout training. Additionally, we introduce a unique catalyst strategy that dynamically schedules the integration of the distillation loss. By initially training the global model without distillation, we determine the optimal timing for its introduction, thereby maximizing the effectiveness of knowledge transfer once local models have stabilized. Experimental results on three benchmark datasets, AV-MNIST, MM-IMDB, and MIMIC III, demonstrate that ModiFedCat outperforms existing multi-modal federated learning methods. The proposed framework significantly improves the fusion capability of multi-modal models while maintaining client data privacy. This approach balances local adaptation and global knowledge integration, making it a robust solution for multi-modal federated learning scenarios.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103378"},"PeriodicalIF":14.7000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004518","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The integration of multi-modal data in federated learning systems faces significant challenges in balancing privacy preservation with effective cross-modal correlation learning under strict client isolation constraints. We propose ModiFedCat, a novel curriculum-guided multi-modal federated distillation framework that combines hierarchical knowledge transfer with adaptive training scheduling to enhance client-side model performance while maintaining rigorous data privacy. Our method computes multi-modal knowledge distillation losses at both the feature extraction and output layers, ensuring that local models are consistently aligned with the global model throughout training. Additionally, we introduce a unique catalyst strategy that dynamically schedules the integration of the distillation loss. By initially training the global model without distillation, we determine the optimal timing for its introduction, thereby maximizing the effectiveness of knowledge transfer once local models have stabilized. Experimental results on three benchmark datasets, AV-MNIST, MM-IMDB, and MIMIC III, demonstrate that ModiFedCat outperforms existing multi-modal federated learning methods. The proposed framework significantly improves the fusion capability of multi-modal models while maintaining client data privacy. This approach balances local adaptation and global knowledge integration, making it a robust solution for multi-modal federated learning scenarios.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.