MLLM-FL:异构长尾数据上的多模态大语言模型辅助联合学习

Jianyi Zhang, Hao Frank Yang, Ang Li, Xin Guo, Pu Wang, Haiming Wang, Yiran Chen, Hai Li
{"title":"MLLM-FL:异构长尾数据上的多模态大语言模型辅助联合学习","authors":"Jianyi Zhang, Hao Frank Yang, Ang Li, Xin Guo, Pu Wang, Haiming Wang, Yiran Chen, Hai Li","doi":"arxiv-2409.06067","DOIUrl":null,"url":null,"abstract":"Previous studies on federated learning (FL) often encounter performance\ndegradation due to data heterogeneity among different clients. In light of the\nrecent advances in multimodal large language models (MLLMs), such as GPT-4v and\nLLaVA, which demonstrate their exceptional proficiency in multimodal tasks,\nsuch as image captioning and multimodal question answering. We introduce a\nnovel federated learning framework, named Multimodal Large Language Model\nAssisted Federated Learning (MLLM-FL), which which employs powerful MLLMs at\nthe server end to address the heterogeneous and long-tailed challenges. Owing\nto the advanced cross-modality representation capabilities and the extensive\nopen-vocabulary prior knowledge of MLLMs, our framework is adept at harnessing\nthe extensive, yet previously underexploited, open-source data accessible from\nwebsites and powerful server-side computational resources. Hence, the MLLM-FL\nnot only enhances the performance but also avoids increasing the risk of\nprivacy leakage and the computational burden on local devices, distinguishing\nit from prior methodologies. Our framework has three key stages. Initially,\nprior to local training on local datasets of clients, we conduct global\nvisual-text pretraining of the model. This pretraining is facilitated by\nutilizing the extensive open-source data available online, with the assistance\nof multimodal large language models. Subsequently, the pretrained model is\ndistributed among various clients for local training. Finally, once the locally\ntrained models are transmitted back to the server, a global alignment is\ncarried out under the supervision of MLLMs to further enhance the performance.\nExperimental evaluations on established benchmarks, show that our framework\ndelivers promising performance in the typical scenarios with data heterogeneity\nand long-tail distribution across different clients in FL.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"156 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MLLM-FL: Multimodal Large Language Model Assisted Federated Learning on Heterogeneous and Long-tailed Data\",\"authors\":\"Jianyi Zhang, Hao Frank Yang, Ang Li, Xin Guo, Pu Wang, Haiming Wang, Yiran Chen, Hai Li\",\"doi\":\"arxiv-2409.06067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous studies on federated learning (FL) often encounter performance\\ndegradation due to data heterogeneity among different clients. In light of the\\nrecent advances in multimodal large language models (MLLMs), such as GPT-4v and\\nLLaVA, which demonstrate their exceptional proficiency in multimodal tasks,\\nsuch as image captioning and multimodal question answering. We introduce a\\nnovel federated learning framework, named Multimodal Large Language Model\\nAssisted Federated Learning (MLLM-FL), which which employs powerful MLLMs at\\nthe server end to address the heterogeneous and long-tailed challenges. Owing\\nto the advanced cross-modality representation capabilities and the extensive\\nopen-vocabulary prior knowledge of MLLMs, our framework is adept at harnessing\\nthe extensive, yet previously underexploited, open-source data accessible from\\nwebsites and powerful server-side computational resources. Hence, the MLLM-FL\\nnot only enhances the performance but also avoids increasing the risk of\\nprivacy leakage and the computational burden on local devices, distinguishing\\nit from prior methodologies. Our framework has three key stages. Initially,\\nprior to local training on local datasets of clients, we conduct global\\nvisual-text pretraining of the model. This pretraining is facilitated by\\nutilizing the extensive open-source data available online, with the assistance\\nof multimodal large language models. Subsequently, the pretrained model is\\ndistributed among various clients for local training. Finally, once the locally\\ntrained models are transmitted back to the server, a global alignment is\\ncarried out under the supervision of MLLMs to further enhance the performance.\\nExperimental evaluations on established benchmarks, show that our framework\\ndelivers promising performance in the typical scenarios with data heterogeneity\\nand long-tail distribution across different clients in FL.\",\"PeriodicalId\":501479,\"journal\":{\"name\":\"arXiv - CS - Artificial Intelligence\",\"volume\":\"156 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

以往关于联合学习(FL)的研究经常会遇到由于不同客户端之间的数据异构而导致性能下降的问题。鉴于多模态大型语言模型(MLLMs)的最新进展,如 GPT-4v 和LLaVA,它们在多模态任务(如图像字幕和多模态问题解答)中表现出了非凡的能力。我们介绍了一种新的联合学习框架,名为 "多模态大语言模型辅助联合学习(MLLM-FL)",它在服务器端采用强大的 MLLM 来应对异构和长尾挑战。由于 MLLMs 先进的跨模态表示能力和广泛的开放词汇先验知识,我们的框架善于利用从网站获取的大量但以前未得到充分利用的开源数据和强大的服务器端计算资源。因此,MLLM-FL 不仅能提高性能,还能避免增加隐私泄露的风险和本地设备的计算负担,从而区别于之前的方法。我们的框架分为三个关键阶段。首先,在对客户的本地数据集进行本地训练之前,我们对模型进行全局视觉文本预训练。在多模态大型语言模型的帮助下,我们利用广泛的在线开源数据进行预训练。随后,预训练好的模型会被分发到不同的客户端进行本地训练。最后,一旦本地训练的模型被传输回服务器,就会在多模态大语言模型的监督下进行全局配准,以进一步提高性能。在已建立的基准上进行的实验评估表明,我们的框架在 FL 中不同客户端数据异构和长尾分布的典型场景中提供了良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MLLM-FL: Multimodal Large Language Model Assisted Federated Learning on Heterogeneous and Long-tailed Data
Previous studies on federated learning (FL) often encounter performance degradation due to data heterogeneity among different clients. In light of the recent advances in multimodal large language models (MLLMs), such as GPT-4v and LLaVA, which demonstrate their exceptional proficiency in multimodal tasks, such as image captioning and multimodal question answering. We introduce a novel federated learning framework, named Multimodal Large Language Model Assisted Federated Learning (MLLM-FL), which which employs powerful MLLMs at the server end to address the heterogeneous and long-tailed challenges. Owing to the advanced cross-modality representation capabilities and the extensive open-vocabulary prior knowledge of MLLMs, our framework is adept at harnessing the extensive, yet previously underexploited, open-source data accessible from websites and powerful server-side computational resources. Hence, the MLLM-FL not only enhances the performance but also avoids increasing the risk of privacy leakage and the computational burden on local devices, distinguishing it from prior methodologies. Our framework has three key stages. Initially, prior to local training on local datasets of clients, we conduct global visual-text pretraining of the model. This pretraining is facilitated by utilizing the extensive open-source data available online, with the assistance of multimodal large language models. Subsequently, the pretrained model is distributed among various clients for local training. Finally, once the locally trained models are transmitted back to the server, a global alignment is carried out under the supervision of MLLMs to further enhance the performance. Experimental evaluations on established benchmarks, show that our framework delivers promising performance in the typical scenarios with data heterogeneity and long-tail distribution across different clients in FL.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信