Survey of Multimodal Federated Learning: Exploring Data Integration, Challenges, and Future Directions

IF 6.3 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Open Journal of the Communications Society Pub Date : 2025-03-26 DOI:10.1109/OJCOMS.2025.3554537

Mumin Adam;Abdullatif Albaseer;Uthman Baroudi;Mohamed Abdallah

{"title":"Survey of Multimodal Federated Learning: Exploring Data Integration, Challenges, and Future Directions","authors":"Mumin Adam;Abdullatif Albaseer;Uthman Baroudi;Mohamed Abdallah","doi":"10.1109/OJCOMS.2025.3554537","DOIUrl":null,"url":null,"abstract":"The rapidly expanding demand for intelligent wireless applications and the Internet of Things (IoT) requires advanced system designs to handle multimodal data effectively while ensuring user privacy and data security. Traditional machine learning (ML) models rely on centralized architectures, which, while powerful, often present significant privacy risks due to the centralization of sensitive data. Federated Learning (FL) is a promising decentralized alternative for addressing these issues. However, FL predominantly handles unimodal data, which limits its applicability in environments where devices collect and process various data types such as text, images, and sensor output. To address this limitation, Multimodal FL (MMFL) integrates multiple data modalities, enabling a richer and more holistic understanding of data. In this survey, we explore the challenges and advancements in MMFL, including data representation, fusion techniques, and cross-modal learning strategies. We present a comprehensive taxonomy of MMFL, outlining critical challenges such as modality imbalance, fusion complexity, and security concerns. Additionally, we highlight the role of transformers in MMFL by leveraging their powerful attention mechanisms to process multimodal data in a federated setting. Finally, we discuss various applications of MMFL, including healthcare, human activity recognition, and emotion recognition, and propose future research directions for improving the scalability and robustness of MMFL systems in real-world scenarios.","PeriodicalId":33803,"journal":{"name":"IEEE Open Journal of the Communications Society","volume":"6 ","pages":"2510-2538"},"PeriodicalIF":6.3000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10938626","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Communications Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10938626/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

The rapidly expanding demand for intelligent wireless applications and the Internet of Things (IoT) requires advanced system designs to handle multimodal data effectively while ensuring user privacy and data security. Traditional machine learning (ML) models rely on centralized architectures, which, while powerful, often present significant privacy risks due to the centralization of sensitive data. Federated Learning (FL) is a promising decentralized alternative for addressing these issues. However, FL predominantly handles unimodal data, which limits its applicability in environments where devices collect and process various data types such as text, images, and sensor output. To address this limitation, Multimodal FL (MMFL) integrates multiple data modalities, enabling a richer and more holistic understanding of data. In this survey, we explore the challenges and advancements in MMFL, including data representation, fusion techniques, and cross-modal learning strategies. We present a comprehensive taxonomy of MMFL, outlining critical challenges such as modality imbalance, fusion complexity, and security concerns. Additionally, we highlight the role of transformers in MMFL by leveraging their powerful attention mechanisms to process multimodal data in a federated setting. Finally, we discuss various applications of MMFL, including healthcare, human activity recognition, and emotion recognition, and propose future research directions for improving the scalability and robustness of MMFL systems in real-world scenarios.

查看原文本刊更多论文

多模态联邦学习调查：探索数据集成、挑战和未来方向

智能无线应用和物联网（IoT）需求的快速增长需要先进的系统设计来有效处理多模态数据，同时确保用户隐私和数据安全。传统的机器学习（ML）模型依赖于集中式架构，这种架构虽然功能强大，但由于敏感数据的集中，往往会带来重大的隐私风险。联邦学习（FL）是解决这些问题的一种很有前途的去中心化替代方案。然而，FL主要处理单峰数据，这限制了它在设备收集和处理各种数据类型（如文本、图像和传感器输出）的环境中的适用性。为了解决这一限制，Multimodal FL （MMFL）集成了多种数据模式，从而能够更丰富、更全面地理解数据。在这项调查中，我们探讨了MMFL的挑战和进步，包括数据表示、融合技术和跨模式学习策略。我们提出了MMFL的综合分类，概述了诸如模式不平衡、融合复杂性和安全问题等关键挑战。此外，我们强调了变压器在MMFL中的作用，通过利用其强大的注意力机制来处理联邦设置中的多模态数据。最后，我们讨论了MMFL的各种应用，包括医疗保健、人类活动识别和情感识别，并提出了未来的研究方向，以提高MMFL系统在现实场景中的可扩展性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Open Journal of the Communications Society Multiple-

CiteScore

13.70

自引率

3.80%

发文量

审稿时长

10 weeks

期刊介绍： The IEEE Open Journal of the Communications Society (OJ-COMS) is an open access, all-electronic journal that publishes original high-quality manuscripts on advances in the state of the art of telecommunications systems and networks. The papers in IEEE OJ-COMS are included in Scopus. Submissions reporting new theoretical findings (including novel methods, concepts, and studies) and practical contributions (including experiments and development of prototypes) are welcome. Additionally, survey and tutorial articles are considered. The IEEE OJCOMS received its debut impact factor of 7.9 according to the Journal Citation Reports (JCR) 2023. The IEEE Open Journal of the Communications Society covers science, technology, applications and standards for information organization, collection and transfer using electronic, optical and wireless channels and networks. Some specific areas covered include: Systems and network architecture, control and management Protocols, software, and middleware Quality of service, reliability, and security Modulation, detection, coding, and signaling Switching and routing Mobile and portable communications Terminals and other end-user devices Networks for content distribution and distributed computing Communications-based distributed resources control.