防御单源对手的多模态融合模型

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2021-06-01 DOI:10.1109/CVPR46437.2021.00335

Karren D. Yang, Wan-Yi Lin, M. Barman, Filipe Condessa, Zico Kolter

{"title":"防御单源对手的多模态融合模型","authors":"Karren D. Yang, Wan-Yi Lin, M. Barman, Filipe Condessa, Zico Kolter","doi":"10.1109/CVPR46437.2021.00335","DOIUrl":null,"url":null,"abstract":"Beyond achieving high performance across many vision tasks, multimodal models are expected to be robust to single-source faults due to the availability of redundant information between modalities. In this paper, we investigate the robustness of multimodal neural networks against worst-case (i.e., adversarial) perturbations on a single modality. We first show that standard multimodal fusion models are vulnerable to single-source adversaries: an attack on any single modality can overcome the correct information from multiple unperturbed modalities and cause the model to fail. This surprising vulnerability holds across diverse multimodal tasks and necessitates a solution. Motivated by this finding, we propose an adversarially robust fusion strategy that trains the model to compare information coming from all the input sources, detect inconsistencies in the perturbed modality compared to the other modalities, and only allow information from the unperturbed modalities to pass through. Our approach significantly improves on state-of-the-art methods in single-source robustness, achieving gains of 7.8-25.2% on action recognition, 19.7-48.2% on object detection, and 1.6-6.7% on sentiment analysis, without degrading performance on unperturbed (i.e., clean) data.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Defending Multimodal Fusion Models against Single-Source Adversaries\",\"authors\":\"Karren D. Yang, Wan-Yi Lin, M. Barman, Filipe Condessa, Zico Kolter\",\"doi\":\"10.1109/CVPR46437.2021.00335\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Beyond achieving high performance across many vision tasks, multimodal models are expected to be robust to single-source faults due to the availability of redundant information between modalities. In this paper, we investigate the robustness of multimodal neural networks against worst-case (i.e., adversarial) perturbations on a single modality. We first show that standard multimodal fusion models are vulnerable to single-source adversaries: an attack on any single modality can overcome the correct information from multiple unperturbed modalities and cause the model to fail. This surprising vulnerability holds across diverse multimodal tasks and necessitates a solution. Motivated by this finding, we propose an adversarially robust fusion strategy that trains the model to compare information coming from all the input sources, detect inconsistencies in the perturbed modality compared to the other modalities, and only allow information from the unperturbed modalities to pass through. Our approach significantly improves on state-of-the-art methods in single-source robustness, achieving gains of 7.8-25.2% on action recognition, 19.7-48.2% on object detection, and 1.6-6.7% on sentiment analysis, without degrading performance on unperturbed (i.e., clean) data.\",\"PeriodicalId\":339646,\"journal\":{\"name\":\"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR46437.2021.00335\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR46437.2021.00335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

除了在许多视觉任务中实现高性能之外，由于模态之间冗余信息的可用性，多模态模型有望对单源故障具有鲁棒性。在本文中，我们研究了多模态神经网络对单模态上最坏情况(即对抗性)扰动的鲁棒性。我们首先表明，标准的多模态融合模型容易受到单源对手的攻击:对任何单一模态的攻击都可以克服来自多个未受干扰模态的正确信息，并导致模型失败。这个令人惊讶的漏洞存在于各种多模式任务中，需要一个解决方案。基于这一发现，我们提出了一种对抗鲁棒的融合策略，该策略训练模型来比较来自所有输入源的信息，检测受干扰模态与其他模态的不一致性，并且只允许来自未受干扰模态的信息通过。我们的方法在单源鲁棒性方面显著改进了最先进的方法，在动作识别上实现了7.8-25.2%的增益，在目标检测上实现了19.7-48.2%的增益，在情感分析上实现了1.6-6.7%的增益，而在未受干扰(即干净)数据上的性能没有下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Defending Multimodal Fusion Models against Single-Source Adversaries

Beyond achieving high performance across many vision tasks, multimodal models are expected to be robust to single-source faults due to the availability of redundant information between modalities. In this paper, we investigate the robustness of multimodal neural networks against worst-case (i.e., adversarial) perturbations on a single modality. We first show that standard multimodal fusion models are vulnerable to single-source adversaries: an attack on any single modality can overcome the correct information from multiple unperturbed modalities and cause the model to fail. This surprising vulnerability holds across diverse multimodal tasks and necessitates a solution. Motivated by this finding, we propose an adversarially robust fusion strategy that trains the model to compare information coming from all the input sources, detect inconsistencies in the perturbed modality compared to the other modalities, and only allow information from the unperturbed modalities to pass through. Our approach significantly improves on state-of-the-art methods in single-source robustness, achieving gains of 7.8-25.2% on action recognition, 19.7-48.2% on object detection, and 1.6-6.7% on sentiment analysis, without degrading performance on unperturbed (i.e., clean) data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量