MSV: Contribution of Modalities based on the Shapley Value

Jangyeong Jeon, Jungeun Kim, Jinwoo Park, Junyeong Kim
{"title":"MSV: Contribution of Modalities based on the Shapley Value","authors":"Jangyeong Jeon, Jungeun Kim, Jinwoo Park, Junyeong Kim","doi":"10.1109/ICCE59016.2024.10444313","DOIUrl":null,"url":null,"abstract":"Recently, with the remarkable development of deep learning, more complex tasks caused by real-world applications have been proposed to shift from single-modality learning to multiple-modality comprehension. This also means that the need for models capable of addressing comprehensive information from multi-modal datasets has increased. In multimodal tasks, proper interaction and fusion between different modalities amongst language, vision, sensory, and text play an important role in accurate predictions and identification. Therefore, detecting flaw led by the respective modalities when combining all modalities is of utmost importance. However, the complex, opaque and black-box nature of the model makes it challenging to understand the model’s working and the impact of individual modalities, especially in complicated multimodal tasks. In addressing this issue, we directly employed the method presented in previous works and effectively applied it to the Visual Commonsense Generation task to quantify the contribution of different modalities. In this paper, we introduce the Contribution of Modalities based on the Shapley Value score, a metric designed to measure the marginal contribution of each modality. Drawing inspiration from previous studies that utilized the Shapley value in modality, we extend its application to the ”Visual Commonsense Generation” task. In experiments conducted on three modal tasks, our score offers enhanced interpretability for the multi-modal model.","PeriodicalId":518694,"journal":{"name":"2024 IEEE International Conference on Consumer Electronics (ICCE)","volume":"107 12","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE International Conference on Consumer Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE59016.2024.10444313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, with the remarkable development of deep learning, more complex tasks caused by real-world applications have been proposed to shift from single-modality learning to multiple-modality comprehension. This also means that the need for models capable of addressing comprehensive information from multi-modal datasets has increased. In multimodal tasks, proper interaction and fusion between different modalities amongst language, vision, sensory, and text play an important role in accurate predictions and identification. Therefore, detecting flaw led by the respective modalities when combining all modalities is of utmost importance. However, the complex, opaque and black-box nature of the model makes it challenging to understand the model’s working and the impact of individual modalities, especially in complicated multimodal tasks. In addressing this issue, we directly employed the method presented in previous works and effectively applied it to the Visual Commonsense Generation task to quantify the contribution of different modalities. In this paper, we introduce the Contribution of Modalities based on the Shapley Value score, a metric designed to measure the marginal contribution of each modality. Drawing inspiration from previous studies that utilized the Shapley value in modality, we extend its application to the ”Visual Commonsense Generation” task. In experiments conducted on three modal tasks, our score offers enhanced interpretability for the multi-modal model.
MSV:基于沙普利值的模式贡献率
近来,随着深度学习的显著发展,人们提出了由现实世界应用引起的更复杂的任务,即从单模态学习转向多模态理解。这也意味着,对能够处理多模态数据集综合信息的模型的需求有所增加。在多模态任务中,语言、视觉、感官和文本等不同模态之间的适当交互和融合对于准确预测和识别起着重要作用。因此,在结合所有模态时检测由各自模态导致的缺陷至关重要。然而,由于模型的复杂性、不透明性和黑箱性,要了解模型的工作原理和单个模态的影响具有挑战性,尤其是在复杂的多模态任务中。为解决这一问题,我们直接采用了前人的研究方法,并将其有效地应用于视觉共感生成任务,以量化不同模态的贡献。在本文中,我们介绍了基于夏普利值分数的模态贡献,这是一种旨在衡量每种模态的边际贡献的指标。我们从以往在模态中使用夏普利值的研究中汲取灵感,将其应用扩展到 "视觉共感生成 "任务中。在对三种模态任务进行的实验中,我们的分数增强了多模态模型的可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信