Jian Yang;Zhiyu Guan;Jun Li;Zhiping Shi;Xianglong Liu
{"title":"Diffusion Patch Attack With Spatial–Temporal Cross-Evolution for Video Recognition","authors":"Jian Yang;Zhiyu Guan;Jun Li;Zhiping Shi;Xianglong Liu","doi":"10.1109/TCSVT.2024.3452475","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have demonstrated excellent performance across various domains. However, recent studies have shown that deep neural networks are vulnerable to adversarial examples, including DNN-based video action recognition models. While much of the existing research on adversarial attacks against video models focuses on perturbation-based attacks, there is limited research on patch-based black-box attacks. Existing patch-based attack algorithms suffer from the problem of a large search space of optimization algorithms and use patches with simple content, leading to suboptimal attack performance or requiring a large number of queries. To address these challenges, we propose the “Diffusion Patch Attack (DPA) with Spatial-Temporal Cross-Evolution (STCE) for Video Recognition,” a novel approach that integrates the excellent properties of the diffusion model into video black-box adversarial attacks for the first time. This integration significantly narrows the parameter search space while enhancing the adversarial content of patches. Moreover, we introduce the spatial-temporal cross-evolutionary algorithm to adapt to the narrowed search space. Specifically, we separate the spatial and temporal parameters and then employ an alternate evolutionary strategy for each parameter type. Extensive experiments conducted on three widely used video action recognition models (C3D, NL, and TPN) and two benchmark datasets (UCF-101 and HMDB-51) demonstrate the superior performance of our approach compared to other state-of-the-art black-box patch attack algorithms.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13190-13200"},"PeriodicalIF":8.3000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10659878/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Deep neural networks (DNNs) have demonstrated excellent performance across various domains. However, recent studies have shown that deep neural networks are vulnerable to adversarial examples, including DNN-based video action recognition models. While much of the existing research on adversarial attacks against video models focuses on perturbation-based attacks, there is limited research on patch-based black-box attacks. Existing patch-based attack algorithms suffer from the problem of a large search space of optimization algorithms and use patches with simple content, leading to suboptimal attack performance or requiring a large number of queries. To address these challenges, we propose the “Diffusion Patch Attack (DPA) with Spatial-Temporal Cross-Evolution (STCE) for Video Recognition,” a novel approach that integrates the excellent properties of the diffusion model into video black-box adversarial attacks for the first time. This integration significantly narrows the parameter search space while enhancing the adversarial content of patches. Moreover, we introduce the spatial-temporal cross-evolutionary algorithm to adapt to the narrowed search space. Specifically, we separate the spatial and temporal parameters and then employ an alternate evolutionary strategy for each parameter type. Extensive experiments conducted on three widely used video action recognition models (C3D, NL, and TPN) and two benchmark datasets (UCF-101 and HMDB-51) demonstrate the superior performance of our approach compared to other state-of-the-art black-box patch attack algorithms.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.