Diffusion Patch Attack With Spatial–Temporal Cross-Evolution for Video Recognition

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-08-30 DOI:10.1109/TCSVT.2024.3452475

Jian Yang;Zhiyu Guan;Jun Li;Zhiping Shi;Xianglong Liu

{"title":"Diffusion Patch Attack With Spatial–Temporal Cross-Evolution for Video Recognition","authors":"Jian Yang;Zhiyu Guan;Jun Li;Zhiping Shi;Xianglong Liu","doi":"10.1109/TCSVT.2024.3452475","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have demonstrated excellent performance across various domains. However, recent studies have shown that deep neural networks are vulnerable to adversarial examples, including DNN-based video action recognition models. While much of the existing research on adversarial attacks against video models focuses on perturbation-based attacks, there is limited research on patch-based black-box attacks. Existing patch-based attack algorithms suffer from the problem of a large search space of optimization algorithms and use patches with simple content, leading to suboptimal attack performance or requiring a large number of queries. To address these challenges, we propose the “Diffusion Patch Attack (DPA) with Spatial-Temporal Cross-Evolution (STCE) for Video Recognition,” a novel approach that integrates the excellent properties of the diffusion model into video black-box adversarial attacks for the first time. This integration significantly narrows the parameter search space while enhancing the adversarial content of patches. Moreover, we introduce the spatial-temporal cross-evolutionary algorithm to adapt to the narrowed search space. Specifically, we separate the spatial and temporal parameters and then employ an alternate evolutionary strategy for each parameter type. Extensive experiments conducted on three widely used video action recognition models (C3D, NL, and TPN) and two benchmark datasets (UCF-101 and HMDB-51) demonstrate the superior performance of our approach compared to other state-of-the-art black-box patch attack algorithms.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13190-13200"},"PeriodicalIF":8.3000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10659878/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Deep neural networks (DNNs) have demonstrated excellent performance across various domains. However, recent studies have shown that deep neural networks are vulnerable to adversarial examples, including DNN-based video action recognition models. While much of the existing research on adversarial attacks against video models focuses on perturbation-based attacks, there is limited research on patch-based black-box attacks. Existing patch-based attack algorithms suffer from the problem of a large search space of optimization algorithms and use patches with simple content, leading to suboptimal attack performance or requiring a large number of queries. To address these challenges, we propose the “Diffusion Patch Attack (DPA) with Spatial-Temporal Cross-Evolution (STCE) for Video Recognition,” a novel approach that integrates the excellent properties of the diffusion model into video black-box adversarial attacks for the first time. This integration significantly narrows the parameter search space while enhancing the adversarial content of patches. Moreover, we introduce the spatial-temporal cross-evolutionary algorithm to adapt to the narrowed search space. Specifically, we separate the spatial and temporal parameters and then employ an alternate evolutionary strategy for each parameter type. Extensive experiments conducted on three widely used video action recognition models (C3D, NL, and TPN) and two benchmark datasets (UCF-101 and HMDB-51) demonstrate the superior performance of our approach compared to other state-of-the-art black-box patch attack algorithms.

查看原文本刊更多论文

利用时空交叉进化的扩散补丁攻击进行视频识别

深度神经网络（dnn）在各个领域都表现出优异的性能。然而，最近的研究表明，深度神经网络容易受到对抗性示例的攻击，包括基于dnn的视频动作识别模型。虽然针对视频模型的对抗性攻击的现有研究大多集中在基于摄动的攻击上，但对基于补丁的黑盒攻击的研究有限。现有基于补丁的攻击算法存在优化算法搜索空间大的问题，且使用内容简单的补丁，导致攻击性能次优或需要大量查询。为了解决这些挑战，我们提出了“用于视频识别的具有时空交叉进化（STCE）的扩散补丁攻击（DPA）”，这是一种新颖的方法，首次将扩散模型的优秀特性集成到视频黑箱对抗攻击中。这种集成显著缩小了参数搜索空间，同时增强了补丁的对抗内容。此外，为了适应搜索空间的缩小，引入了时空交叉进化算法。具体来说，我们分离了空间和时间参数，然后对每种参数类型采用替代的进化策略。在三种广泛使用的视频动作识别模型（C3D， NL和TPN）和两个基准数据集（UCF-101和hdbb -51）上进行的大量实验表明，与其他最先进的黑盒补丁攻击算法相比，我们的方法具有优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.