MSV: Contribution of Modalities based on the Shapley Value

2024 IEEE International Conference on Consumer Electronics (ICCE) Pub Date : 2024-01-06 DOI:10.1109/ICCE59016.2024.10444313

Jangyeong Jeon, Jungeun Kim, Jinwoo Park, Junyeong Kim

{"title":"MSV: Contribution of Modalities based on the Shapley Value","authors":"Jangyeong Jeon, Jungeun Kim, Jinwoo Park, Junyeong Kim","doi":"10.1109/ICCE59016.2024.10444313","DOIUrl":null,"url":null,"abstract":"Recently, with the remarkable development of deep learning, more complex tasks caused by real-world applications have been proposed to shift from single-modality learning to multiple-modality comprehension. This also means that the need for models capable of addressing comprehensive information from multi-modal datasets has increased. In multimodal tasks, proper interaction and fusion between different modalities amongst language, vision, sensory, and text play an important role in accurate predictions and identification. Therefore, detecting flaw led by the respective modalities when combining all modalities is of utmost importance. However, the complex, opaque and black-box nature of the model makes it challenging to understand the model’s working and the impact of individual modalities, especially in complicated multimodal tasks. In addressing this issue, we directly employed the method presented in previous works and effectively applied it to the Visual Commonsense Generation task to quantify the contribution of different modalities. In this paper, we introduce the Contribution of Modalities based on the Shapley Value score, a metric designed to measure the marginal contribution of each modality. Drawing inspiration from previous studies that utilized the Shapley value in modality, we extend its application to the ”Visual Commonsense Generation” task. In experiments conducted on three modal tasks, our score offers enhanced interpretability for the multi-modal model.","PeriodicalId":518694,"journal":{"name":"2024 IEEE International Conference on Consumer Electronics (ICCE)","volume":"107 12","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE International Conference on Consumer Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE59016.2024.10444313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, with the remarkable development of deep learning, more complex tasks caused by real-world applications have been proposed to shift from single-modality learning to multiple-modality comprehension. This also means that the need for models capable of addressing comprehensive information from multi-modal datasets has increased. In multimodal tasks, proper interaction and fusion between different modalities amongst language, vision, sensory, and text play an important role in accurate predictions and identification. Therefore, detecting flaw led by the respective modalities when combining all modalities is of utmost importance. However, the complex, opaque and black-box nature of the model makes it challenging to understand the model’s working and the impact of individual modalities, especially in complicated multimodal tasks. In addressing this issue, we directly employed the method presented in previous works and effectively applied it to the Visual Commonsense Generation task to quantify the contribution of different modalities. In this paper, we introduce the Contribution of Modalities based on the Shapley Value score, a metric designed to measure the marginal contribution of each modality. Drawing inspiration from previous studies that utilized the Shapley value in modality, we extend its application to the ”Visual Commonsense Generation” task. In experiments conducted on three modal tasks, our score offers enhanced interpretability for the multi-modal model.

查看原文本刊更多论文

MSV：基于沙普利值的模式贡献率

近来，随着深度学习的显著发展，人们提出了由现实世界应用引起的更复杂的任务，即从单模态学习转向多模态理解。这也意味着，对能够处理多模态数据集综合信息的模型的需求有所增加。在多模态任务中，语言、视觉、感官和文本等不同模态之间的适当交互和融合对于准确预测和识别起着重要作用。因此，在结合所有模态时检测由各自模态导致的缺陷至关重要。然而，由于模型的复杂性、不透明性和黑箱性，要了解模型的工作原理和单个模态的影响具有挑战性，尤其是在复杂的多模态任务中。为解决这一问题，我们直接采用了前人的研究方法，并将其有效地应用于视觉共感生成任务，以量化不同模态的贡献。在本文中，我们介绍了基于夏普利值分数的模态贡献，这是一种旨在衡量每种模态的边际贡献的指标。我们从以往在模态中使用夏普利值的研究中汲取灵感，将其应用扩展到 "视觉共感生成 "任务中。在对三种模态任务进行的实验中，我们的分数增强了多模态模型的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2024 IEEE International Conference on Consumer Electronics (ICCE)

自引率

0.00%

发文量