Gang Wang;Yanfeng Zhang;Chenhao Ying;Qinnan Zhang;Zehui Xiong;Jiakang Wang;Ge Yu
{"title":"MMFed:异构设备的多模态联邦学习框架","authors":"Gang Wang;Yanfeng Zhang;Chenhao Ying;Qinnan Zhang;Zehui Xiong;Jiakang Wang;Ge Yu","doi":"10.1109/JIOT.2025.3579858","DOIUrl":null,"url":null,"abstract":"Existing federated learning (FL) frameworks are primarily designed for single-modal data. However, real-world scenarios require processing multimodal data on heterogeneous devices. The gap between existing methods and real-world scenarios presents challenges in processing multimodal data on heterogeneous devices, significantly impacting model training efficiency. To address these issues, we propose a multimodal FL framework, MMFed, which integrates multimodal algorithms with a semi-synchronous training method. The multimodal algorithm trains local autoencoders on different data modalities. By leveraging the similarity of encodings across different modalities with the same data labels, we further train and aggregate these local autoencoders into a global autoencoder, which is then deployed on the blockchain to perform downstream classification tasks. In the semi-synchronous training method, each device updates its parameters independently during a round. At the end of each round, a global aggregation combines the updates from devices. We conduct an empirical evaluation of our framework on various multimodal datasets, including Opportunity (Opp) Challenge, mHealth, and UR Fall Detection datasets. Experimental results demonstrate that our FL framework, MMFed, outperforms the state-of-the-art multimodal frameworks on three multimodal datasets, achieving an average accuracy improvement of 9.07%. Furthermore, in terms of training speed, MMFed is obviously superior to synchronization strategies when it is extended to a large number of clients.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 18","pages":"36893-36907"},"PeriodicalIF":8.9000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MMFed: A Multimodal Federated Learning Framework for Heterogeneous Devices\",\"authors\":\"Gang Wang;Yanfeng Zhang;Chenhao Ying;Qinnan Zhang;Zehui Xiong;Jiakang Wang;Ge Yu\",\"doi\":\"10.1109/JIOT.2025.3579858\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing federated learning (FL) frameworks are primarily designed for single-modal data. However, real-world scenarios require processing multimodal data on heterogeneous devices. The gap between existing methods and real-world scenarios presents challenges in processing multimodal data on heterogeneous devices, significantly impacting model training efficiency. To address these issues, we propose a multimodal FL framework, MMFed, which integrates multimodal algorithms with a semi-synchronous training method. The multimodal algorithm trains local autoencoders on different data modalities. By leveraging the similarity of encodings across different modalities with the same data labels, we further train and aggregate these local autoencoders into a global autoencoder, which is then deployed on the blockchain to perform downstream classification tasks. In the semi-synchronous training method, each device updates its parameters independently during a round. At the end of each round, a global aggregation combines the updates from devices. We conduct an empirical evaluation of our framework on various multimodal datasets, including Opportunity (Opp) Challenge, mHealth, and UR Fall Detection datasets. Experimental results demonstrate that our FL framework, MMFed, outperforms the state-of-the-art multimodal frameworks on three multimodal datasets, achieving an average accuracy improvement of 9.07%. Furthermore, in terms of training speed, MMFed is obviously superior to synchronization strategies when it is extended to a large number of clients.\",\"PeriodicalId\":54347,\"journal\":{\"name\":\"IEEE Internet of Things Journal\",\"volume\":\"12 18\",\"pages\":\"36893-36907\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Internet of Things Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11063398/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11063398/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
MMFed: A Multimodal Federated Learning Framework for Heterogeneous Devices
Existing federated learning (FL) frameworks are primarily designed for single-modal data. However, real-world scenarios require processing multimodal data on heterogeneous devices. The gap between existing methods and real-world scenarios presents challenges in processing multimodal data on heterogeneous devices, significantly impacting model training efficiency. To address these issues, we propose a multimodal FL framework, MMFed, which integrates multimodal algorithms with a semi-synchronous training method. The multimodal algorithm trains local autoencoders on different data modalities. By leveraging the similarity of encodings across different modalities with the same data labels, we further train and aggregate these local autoencoders into a global autoencoder, which is then deployed on the blockchain to perform downstream classification tasks. In the semi-synchronous training method, each device updates its parameters independently during a round. At the end of each round, a global aggregation combines the updates from devices. We conduct an empirical evaluation of our framework on various multimodal datasets, including Opportunity (Opp) Challenge, mHealth, and UR Fall Detection datasets. Experimental results demonstrate that our FL framework, MMFed, outperforms the state-of-the-art multimodal frameworks on three multimodal datasets, achieving an average accuracy improvement of 9.07%. Furthermore, in terms of training speed, MMFed is obviously superior to synchronization strategies when it is extended to a large number of clients.
期刊介绍:
The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.