解决多模态大语言模型训练中的模型和数据异质性问题

Zili Zhang, Yinmin Zhong, Ranchen Ming, Hanpeng Hu, Jianjian Sun, Zheng Ge, Yibo Zhu, Xin Jin
{"title":"解决多模态大语言模型训练中的模型和数据异质性问题","authors":"Zili Zhang, Yinmin Zhong, Ranchen Ming, Hanpeng Hu, Jianjian Sun, Zheng Ge, Yibo Zhu, Xin Jin","doi":"arxiv-2408.04275","DOIUrl":null,"url":null,"abstract":"Multimodal large language models (LLMs) have demonstrated significant\npotential in a wide range of AI applications. Yet, training multimodal LLMs\nsuffers from low efficiency and scalability, due to the inherent model\nheterogeneity and data heterogeneity across different modalities. We present MMScale, an efficient and adaptive framework to reform the\ntraining of multimodal large language models on large-scale clusters. MMScale\nexploits the system characteristics of multimodal LLM training to achieve high\nefficiency and scalability. The core of MMScale is the adaptive resource\nallocation and data-aware reordering techniques to eliminate the model and data\nheterogeneity respectively. We also tailor system optimizations for multimodal\nLLM training to offload certain operations from the GPU training. We evaluate\nMMScale across different sizes of multimodal LLMs on a large-scale production\ncluster with thousands of GPUs. The experimental results show that MMScale\nachieves 54.7% Model FLOPs Utilization (MFU) when training a 72B multimodal LLM\non 1172 GPUs and outperforms Megatron-LM by up to 2.2$\\times$ on throughput.\nThe ablation study shows the main techniques of MMScale are both effective and\nlightweight.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"119 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Addressing Model and Data Heterogeneity in Multimodal Large Language Model Training\",\"authors\":\"Zili Zhang, Yinmin Zhong, Ranchen Ming, Hanpeng Hu, Jianjian Sun, Zheng Ge, Yibo Zhu, Xin Jin\",\"doi\":\"arxiv-2408.04275\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal large language models (LLMs) have demonstrated significant\\npotential in a wide range of AI applications. Yet, training multimodal LLMs\\nsuffers from low efficiency and scalability, due to the inherent model\\nheterogeneity and data heterogeneity across different modalities. We present MMScale, an efficient and adaptive framework to reform the\\ntraining of multimodal large language models on large-scale clusters. MMScale\\nexploits the system characteristics of multimodal LLM training to achieve high\\nefficiency and scalability. The core of MMScale is the adaptive resource\\nallocation and data-aware reordering techniques to eliminate the model and data\\nheterogeneity respectively. We also tailor system optimizations for multimodal\\nLLM training to offload certain operations from the GPU training. We evaluate\\nMMScale across different sizes of multimodal LLMs on a large-scale production\\ncluster with thousands of GPUs. The experimental results show that MMScale\\nachieves 54.7% Model FLOPs Utilization (MFU) when training a 72B multimodal LLM\\non 1172 GPUs and outperforms Megatron-LM by up to 2.2$\\\\times$ on throughput.\\nThe ablation study shows the main techniques of MMScale are both effective and\\nlightweight.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"119 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04275\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

多模态大语言模型(LLM)在广泛的人工智能应用中展现出了巨大的潜力。然而,由于不同模态之间固有的模型异构性和数据异构性,训练多模态大型语言模型的效率和可扩展性都很低。我们提出的 MMScale 是一个高效的自适应框架,用于改革大规模集群上多模态大型语言模型的训练。MMScalee 利用多模态大语言模型训练的系统特性,实现了高效率和可扩展性。MMScale 的核心是自适应资源分配和数据感知重排序技术,以分别消除模型和数据的异构性。我们还为多模态LLM 训练定制了系统优化,以卸载 GPU 训练中的某些操作。我们在一个拥有数千个 GPU 的大规模生产集群上对不同规模的多模态 LLM 进行了 MMScale 评估。实验结果表明,当训练一个72B的多模态LLM时,MMScale在1172个GPU上实现了54.7%的模型FLOPs利用率(MFU),在吞吐量上比Megatron-LM高出2.2倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Addressing Model and Data Heterogeneity in Multimodal Large Language Model Training
Multimodal large language models (LLMs) have demonstrated significant potential in a wide range of AI applications. Yet, training multimodal LLMs suffers from low efficiency and scalability, due to the inherent model heterogeneity and data heterogeneity across different modalities. We present MMScale, an efficient and adaptive framework to reform the training of multimodal large language models on large-scale clusters. MMScale exploits the system characteristics of multimodal LLM training to achieve high efficiency and scalability. The core of MMScale is the adaptive resource allocation and data-aware reordering techniques to eliminate the model and data heterogeneity respectively. We also tailor system optimizations for multimodal LLM training to offload certain operations from the GPU training. We evaluate MMScale across different sizes of multimodal LLMs on a large-scale production cluster with thousands of GPUs. The experimental results show that MMScale achieves 54.7% Model FLOPs Utilization (MFU) when training a 72B multimodal LLM on 1172 GPUs and outperforms Megatron-LM by up to 2.2$\times$ on throughput. The ablation study shows the main techniques of MMScale are both effective and lightweight.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信