The Synergy Between Data and Multi-Modal Large Language Models: A Survey From Co-Development Perspective

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-06-06 DOI:10.1109/TPAMI.2025.3576835

Zhen Qin;Daoyuan Chen;Wenhao Zhang;Liuyi Yao;Yilun Huang;Bolin Ding;Yaliang Li;Shuiguang Deng

{"title":"The Synergy Between Data and Multi-Modal Large Language Models: A Survey From Co-Development Perspective","authors":"Zhen Qin;Daoyuan Chen;Wenhao Zhang;Liuyi Yao;Yilun Huang;Bolin Ding;Yaliang Li;Shuiguang Deng","doi":"10.1109/TPAMI.2025.3576835","DOIUrl":null,"url":null,"abstract":"Recent years have witnessed the rapid development of large language models (LLMs). Multi-modal LLMs (MLLMs) extend modality from text to various domains, attracting widespread attention due to their diverse application scenarios. As LLMs and MLLMs rely on vast amounts of model parameters and data to achieve emergent capabilities, the importance of data is gaining increasing recognition. Reviewing recent data-driven works for MLLMs, we find that the development of models and data is not two separate paths but rather interconnected. Vaster and higher-quality data improve MLLM performance, while MLLMs, in turn, facilitate the development of data. The co-development of multi-modal data and MLLMs requires a clear view of 1) at which development stages of MLLMs specific data-centric approaches can be employed to enhance certain MLLM capabilities, and 2) how MLLMs, using these capabilities, can contribute to multi-modal data in specific roles. To promote data-model co-development for MLLM communities, we systematically review existing works on MLLMs from the data-model co-development perspective.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"8415-8434"},"PeriodicalIF":18.6000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11027559/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent years have witnessed the rapid development of large language models (LLMs). Multi-modal LLMs (MLLMs) extend modality from text to various domains, attracting widespread attention due to their diverse application scenarios. As LLMs and MLLMs rely on vast amounts of model parameters and data to achieve emergent capabilities, the importance of data is gaining increasing recognition. Reviewing recent data-driven works for MLLMs, we find that the development of models and data is not two separate paths but rather interconnected. Vaster and higher-quality data improve MLLM performance, while MLLMs, in turn, facilitate the development of data. The co-development of multi-modal data and MLLMs requires a clear view of 1) at which development stages of MLLMs specific data-centric approaches can be employed to enhance certain MLLM capabilities, and 2) how MLLMs, using these capabilities, can contribute to multi-modal data in specific roles. To promote data-model co-development for MLLM communities, we systematically review existing works on MLLMs from the data-model co-development perspective.

查看原文本刊更多论文

数据与多模态大语言模型的协同：基于协同发展视角的考察

近年来，大型语言模型（llm）得到了迅速发展。多模态法学硕士（Multi-modal llm，简称mllm）将语态从文本扩展到各个领域，因其应用场景的多样化而受到广泛关注。由于llm和mllm依赖于大量的模型参数和数据来实现紧急功能，数据的重要性越来越得到人们的认可。回顾最近针对mlm的数据驱动工作，我们发现模型和数据的开发不是两条独立的路径，而是相互关联的。更大、更高质量的数据提高了MLLM的性能，而MLLM反过来又促进了数据的发展。多模态数据和MLLM的共同开发需要一个清晰的视图：1)在MLLM的哪个开发阶段可以采用特定的以数据为中心的方法来增强某些MLLM功能，以及2)MLLM如何使用这些功能来为特定角色的多模态数据做出贡献。为了促进MLLM社区的数据模型协同开发，我们从数据模型协同开发的角度系统地回顾了现有的MLLM研究成果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量