{"title":"多模态联邦学习调查:探索数据集成、挑战和未来方向","authors":"Mumin Adam;Abdullatif Albaseer;Uthman Baroudi;Mohamed Abdallah","doi":"10.1109/OJCOMS.2025.3554537","DOIUrl":null,"url":null,"abstract":"The rapidly expanding demand for intelligent wireless applications and the Internet of Things (IoT) requires advanced system designs to handle multimodal data effectively while ensuring user privacy and data security. Traditional machine learning (ML) models rely on centralized architectures, which, while powerful, often present significant privacy risks due to the centralization of sensitive data. Federated Learning (FL) is a promising decentralized alternative for addressing these issues. However, FL predominantly handles unimodal data, which limits its applicability in environments where devices collect and process various data types such as text, images, and sensor output. To address this limitation, Multimodal FL (MMFL) integrates multiple data modalities, enabling a richer and more holistic understanding of data. In this survey, we explore the challenges and advancements in MMFL, including data representation, fusion techniques, and cross-modal learning strategies. We present a comprehensive taxonomy of MMFL, outlining critical challenges such as modality imbalance, fusion complexity, and security concerns. Additionally, we highlight the role of transformers in MMFL by leveraging their powerful attention mechanisms to process multimodal data in a federated setting. Finally, we discuss various applications of MMFL, including healthcare, human activity recognition, and emotion recognition, and propose future research directions for improving the scalability and robustness of MMFL systems in real-world scenarios.","PeriodicalId":33803,"journal":{"name":"IEEE Open Journal of the Communications Society","volume":"6 ","pages":"2510-2538"},"PeriodicalIF":6.3000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10938626","citationCount":"0","resultStr":"{\"title\":\"Survey of Multimodal Federated Learning: Exploring Data Integration, Challenges, and Future Directions\",\"authors\":\"Mumin Adam;Abdullatif Albaseer;Uthman Baroudi;Mohamed Abdallah\",\"doi\":\"10.1109/OJCOMS.2025.3554537\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapidly expanding demand for intelligent wireless applications and the Internet of Things (IoT) requires advanced system designs to handle multimodal data effectively while ensuring user privacy and data security. Traditional machine learning (ML) models rely on centralized architectures, which, while powerful, often present significant privacy risks due to the centralization of sensitive data. Federated Learning (FL) is a promising decentralized alternative for addressing these issues. However, FL predominantly handles unimodal data, which limits its applicability in environments where devices collect and process various data types such as text, images, and sensor output. To address this limitation, Multimodal FL (MMFL) integrates multiple data modalities, enabling a richer and more holistic understanding of data. In this survey, we explore the challenges and advancements in MMFL, including data representation, fusion techniques, and cross-modal learning strategies. We present a comprehensive taxonomy of MMFL, outlining critical challenges such as modality imbalance, fusion complexity, and security concerns. Additionally, we highlight the role of transformers in MMFL by leveraging their powerful attention mechanisms to process multimodal data in a federated setting. Finally, we discuss various applications of MMFL, including healthcare, human activity recognition, and emotion recognition, and propose future research directions for improving the scalability and robustness of MMFL systems in real-world scenarios.\",\"PeriodicalId\":33803,\"journal\":{\"name\":\"IEEE Open Journal of the Communications Society\",\"volume\":\"6 \",\"pages\":\"2510-2538\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10938626\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Open Journal of the Communications Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10938626/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Communications Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10938626/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Survey of Multimodal Federated Learning: Exploring Data Integration, Challenges, and Future Directions
The rapidly expanding demand for intelligent wireless applications and the Internet of Things (IoT) requires advanced system designs to handle multimodal data effectively while ensuring user privacy and data security. Traditional machine learning (ML) models rely on centralized architectures, which, while powerful, often present significant privacy risks due to the centralization of sensitive data. Federated Learning (FL) is a promising decentralized alternative for addressing these issues. However, FL predominantly handles unimodal data, which limits its applicability in environments where devices collect and process various data types such as text, images, and sensor output. To address this limitation, Multimodal FL (MMFL) integrates multiple data modalities, enabling a richer and more holistic understanding of data. In this survey, we explore the challenges and advancements in MMFL, including data representation, fusion techniques, and cross-modal learning strategies. We present a comprehensive taxonomy of MMFL, outlining critical challenges such as modality imbalance, fusion complexity, and security concerns. Additionally, we highlight the role of transformers in MMFL by leveraging their powerful attention mechanisms to process multimodal data in a federated setting. Finally, we discuss various applications of MMFL, including healthcare, human activity recognition, and emotion recognition, and propose future research directions for improving the scalability and robustness of MMFL systems in real-world scenarios.
期刊介绍:
The IEEE Open Journal of the Communications Society (OJ-COMS) is an open access, all-electronic journal that publishes original high-quality manuscripts on advances in the state of the art of telecommunications systems and networks. The papers in IEEE OJ-COMS are included in Scopus. Submissions reporting new theoretical findings (including novel methods, concepts, and studies) and practical contributions (including experiments and development of prototypes) are welcome. Additionally, survey and tutorial articles are considered. The IEEE OJCOMS received its debut impact factor of 7.9 according to the Journal Citation Reports (JCR) 2023.
The IEEE Open Journal of the Communications Society covers science, technology, applications and standards for information organization, collection and transfer using electronic, optical and wireless channels and networks. Some specific areas covered include:
Systems and network architecture, control and management
Protocols, software, and middleware
Quality of service, reliability, and security
Modulation, detection, coding, and signaling
Switching and routing
Mobile and portable communications
Terminals and other end-user devices
Networks for content distribution and distributed computing
Communications-based distributed resources control.