Hao Pan, Xiaoli Zhao, Lipeng He, Yicong Shi, Xiaogang Lin
{"title":"A survey of multimodal federated learning: background, applications, and perspectives","authors":"Hao Pan, Xiaoli Zhao, Lipeng He, Yicong Shi, Xiaogang Lin","doi":"10.1007/s00530-024-01422-9","DOIUrl":null,"url":null,"abstract":"<p>Multimodal Federated Learning (MMFL) is a novel machine learning technique that enhances the capabilities of traditional Federated Learning (FL) to support collaborative training of local models using data available in various modalities. With the generation and storage of a vast amount of multimodal data from the internet, sensors, and mobile devices, as well as the rapid iteration of artificial intelligence models, the demand for multimodal models is growing rapidly. While FL has been widely studied in the past few years, most of the existing research was based in unimodal settings. With the hope of inspiring more applications and research within the MMFL paradigm, we conduct a comprehensive review of the progress and challenges in various aspects of state-of-the-art MMFL. Specifically, we analyze the research motivation for MMFL, propose a new classification method of existing research, discuss the available datasets and application scenarios, and put forward perspectives on the opportunities and challenges faced by MMFL.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"45 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01422-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal Federated Learning (MMFL) is a novel machine learning technique that enhances the capabilities of traditional Federated Learning (FL) to support collaborative training of local models using data available in various modalities. With the generation and storage of a vast amount of multimodal data from the internet, sensors, and mobile devices, as well as the rapid iteration of artificial intelligence models, the demand for multimodal models is growing rapidly. While FL has been widely studied in the past few years, most of the existing research was based in unimodal settings. With the hope of inspiring more applications and research within the MMFL paradigm, we conduct a comprehensive review of the progress and challenges in various aspects of state-of-the-art MMFL. Specifically, we analyze the research motivation for MMFL, propose a new classification method of existing research, discuss the available datasets and application scenarios, and put forward perspectives on the opportunities and challenges faced by MMFL.
期刊介绍:
This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.