MuAP:缺失模态视觉语言模型的多步自适应提示学习

Ruiting Dai, Yuqiao Tan, Lisi Mo, Tao He, Ke Qin, Shuang Liang
{"title":"MuAP:缺失模态视觉语言模型的多步自适应提示学习","authors":"Ruiting Dai, Yuqiao Tan, Lisi Mo, Tao He, Ke Qin, Shuang Liang","doi":"arxiv-2409.04693","DOIUrl":null,"url":null,"abstract":"Recently, prompt learning has garnered considerable attention for its success\nin various Vision-Language (VL) tasks. However, existing prompt-based models\nare primarily focused on studying prompt generation and prompt strategies with\ncomplete modality settings, which does not accurately reflect real-world\nscenarios where partial modality information may be missing. In this paper, we\npresent the first comprehensive investigation into prompt learning behavior\nwhen modalities are incomplete, revealing the high sensitivity of prompt-based\nmodels to missing modalities. To this end, we propose a novel Multi-step\nAdaptive Prompt Learning (MuAP) framework, aiming to generate multimodal\nprompts and perform multi-step prompt tuning, which adaptively learns knowledge\nby iteratively aligning modalities. Specifically, we generate multimodal\nprompts for each modality and devise prompt strategies to integrate them into\nthe Transformer model. Subsequently, we sequentially perform prompt tuning from\nsingle-stage and alignment-stage, allowing each modality-prompt to be\nautonomously and adaptively learned, thereby mitigating the imbalance issue\ncaused by only textual prompts that are learnable in previous works. Extensive\nexperiments demonstrate the effectiveness of our MuAP and this model achieves\nsignificant improvements compared to the state-of-the-art on all benchmark\ndatasets","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"37 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality\",\"authors\":\"Ruiting Dai, Yuqiao Tan, Lisi Mo, Tao He, Ke Qin, Shuang Liang\",\"doi\":\"arxiv-2409.04693\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, prompt learning has garnered considerable attention for its success\\nin various Vision-Language (VL) tasks. However, existing prompt-based models\\nare primarily focused on studying prompt generation and prompt strategies with\\ncomplete modality settings, which does not accurately reflect real-world\\nscenarios where partial modality information may be missing. In this paper, we\\npresent the first comprehensive investigation into prompt learning behavior\\nwhen modalities are incomplete, revealing the high sensitivity of prompt-based\\nmodels to missing modalities. To this end, we propose a novel Multi-step\\nAdaptive Prompt Learning (MuAP) framework, aiming to generate multimodal\\nprompts and perform multi-step prompt tuning, which adaptively learns knowledge\\nby iteratively aligning modalities. Specifically, we generate multimodal\\nprompts for each modality and devise prompt strategies to integrate them into\\nthe Transformer model. Subsequently, we sequentially perform prompt tuning from\\nsingle-stage and alignment-stage, allowing each modality-prompt to be\\nautonomously and adaptively learned, thereby mitigating the imbalance issue\\ncaused by only textual prompts that are learnable in previous works. Extensive\\nexperiments demonstrate the effectiveness of our MuAP and this model achieves\\nsignificant improvements compared to the state-of-the-art on all benchmark\\ndatasets\",\"PeriodicalId\":501479,\"journal\":{\"name\":\"arXiv - CS - Artificial Intelligence\",\"volume\":\"37 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.04693\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

最近,提示学习因其在各种视觉语言(VL)任务中的成功应用而备受关注。然而,现有的基于提示的模型主要侧重于研究完整模态设置下的提示生成和提示策略,这并不能准确反映部分模态信息可能缺失的真实世界场景。在本文中,我们首次对模态不完整时的提示学习行为进行了全面研究,揭示了基于提示的模型对模态缺失的高度敏感性。为此,我们提出了一个新颖的多步自适应提示学习(MuAP)框架,旨在生成多模态提示并执行多步提示调整,通过迭代调整模态来自适应地学习知识。具体来说,我们为每种模态生成多模态提示,并设计提示策略将其整合到变形器模型中。随后,我们依次从单个阶段和对齐阶段进行提示调整,使每种模态提示都能美化并自适应地学习,从而缓解了以往工作中只有文本提示可以学习所造成的不平衡问题。广泛的实验证明了我们的 MuAP 的有效性,在所有基准数据集上,该模型都比最先进的模型取得了显著的改进
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality
Recently, prompt learning has garnered considerable attention for its success in various Vision-Language (VL) tasks. However, existing prompt-based models are primarily focused on studying prompt generation and prompt strategies with complete modality settings, which does not accurately reflect real-world scenarios where partial modality information may be missing. In this paper, we present the first comprehensive investigation into prompt learning behavior when modalities are incomplete, revealing the high sensitivity of prompt-based models to missing modalities. To this end, we propose a novel Multi-step Adaptive Prompt Learning (MuAP) framework, aiming to generate multimodal prompts and perform multi-step prompt tuning, which adaptively learns knowledge by iteratively aligning modalities. Specifically, we generate multimodal prompts for each modality and devise prompt strategies to integrate them into the Transformer model. Subsequently, we sequentially perform prompt tuning from single-stage and alignment-stage, allowing each modality-prompt to be autonomously and adaptively learned, thereby mitigating the imbalance issue caused by only textual prompts that are learnable in previous works. Extensive experiments demonstrate the effectiveness of our MuAP and this model achieves significant improvements compared to the state-of-the-art on all benchmark datasets
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信