International Journal of Computer Vision最新文献

筛选
英文 中文
A Generalized Contour Vibration Model for Building Extraction 建筑物提取的广义轮廓振动模型
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-05-22 DOI: 10.1007/s11263-025-02468-6
Chunyan Xu, Shuaizhen Yao, Ziqiang Xu, Zhen Cui, Jian Yang
{"title":"A Generalized Contour Vibration Model for Building Extraction","authors":"Chunyan Xu, Shuaizhen Yao, Ziqiang Xu, Zhen Cui, Jian Yang","doi":"10.1007/s11263-025-02468-6","DOIUrl":"https://doi.org/10.1007/s11263-025-02468-6","url":null,"abstract":"<p>Classic active contour models (ACMs) are becoming a great promising solution to the contour-based object extraction with the progress of deep learning recently. Inspired by the wave vibration theory in physics, we propose a Generalized Contour Vibration Model (G-CVM) by inheriting the force and motion principle of contour wave for automatically estimating building contours. The contour estimation problems, conventionally solved by snake and level-set based ACMs, are unified to formulate as second-order partial differential equation to model the contour evolution. In parallel with the current ACM methods, we propose two types of evolution paradigms: curve-CVM and surface-CVM, from the perspective of the vibration spaces of contour waves. To tailor personalization contours for specific targets, we parameterize the constant coefficient wave differential equation through a convolutional network, and hereby integrate them into a unified learnable model for contour extraction. Through adopting finite difference optimization, we can progressively perform the contour evolution from an initial state through a recursive computation on the contour vibration model. Both the building contour evolution and the model optimization are modulated to form a close-looping end-to-end network. Besides, we make a discussion of ours <i>vs</i> the conventional ACMs, all which can be interpreted uniformly from the view of differential equation in different evolution domains. Comprehensive evaluations on several building datasets demonstrate the effectiveness and superiority of our proposed G-CVM when compared with other state-of-the-art building extraction networks and deep active contour solutions.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144113865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simplified Concrete Dropout - Improving the Generation of Attribution Masks for Fine-grained Classification 简化具体Dropout——改进细粒度分类属性掩码的生成
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-05-22 DOI: 10.1007/s11263-025-02453-z
Dimitri Korsch, Maha Shadaydeh, Joachim Denzler
{"title":"Simplified Concrete Dropout - Improving the Generation of Attribution Masks for Fine-grained Classification","authors":"Dimitri Korsch, Maha Shadaydeh, Joachim Denzler","doi":"10.1007/s11263-025-02453-z","DOIUrl":"https://doi.org/10.1007/s11263-025-02453-z","url":null,"abstract":"<p>In fine-grained classification, which is classifying images into subcategories within a common broader category, it is crucial to have precise visual explanations of the classification model’s decision. While commonly used attention- or gradient-based methods deliver either too coarse or too noisy explanations unsuitable for highlighting subtle visual differences reliably, perturbation-based methods can precisely locate pixels causally responsible for the predicted category. The <i>fill-in of the dropout</i> (FIDO) algorithm is one of those methods, which utilizes <i>concrete dropout</i> (CD) to sample a set of attribution masks and updates the sampling parameters based on the output of the classification model. In this paper, we present a solution against the high variance in the gradient estimates, a known problem of the FIDO algorithm that has been mitigated until now by large mini-batch updates of the sampling parameters. First, our solution allows for estimating the parameters with smaller mini-batch sizes without losing the quality of the estimates but with a reduced computational effort. Next, our method produces finer and more coherent attribution masks. Finally, we use the resulting attribution masks to improve the classification performance on three fine-grained datasets without additional fine-tuning steps and achieve results that are otherwise only achieved if ground truth bounding boxes are used.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"32 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144113873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial-Temporal Transformer for Single RGB-D Camera Synchronous Tracking and Reconstruction of Non-rigid Dynamic Objects 单RGB-D相机同步跟踪与重建非刚体动态目标的时空变换器
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-05-21 DOI: 10.1007/s11263-025-02469-5
Xiaofei Liu, Zhengkun Yi, Xinyu Wu, Wanfeng Shang
{"title":"Spatial-Temporal Transformer for Single RGB-D Camera Synchronous Tracking and Reconstruction of Non-rigid Dynamic Objects","authors":"Xiaofei Liu, Zhengkun Yi, Xinyu Wu, Wanfeng Shang","doi":"10.1007/s11263-025-02469-5","DOIUrl":"https://doi.org/10.1007/s11263-025-02469-5","url":null,"abstract":"<p>We propose a simple and effective method that views the problem of single RGB-D camera synchronous tracking and reconstruction of non-rigid dynamic objects as an aligned sequential point cloud prediction problem. Our method does not require additional data transformations (truncated signed distance function or deformation graphs, etc.), alignment constraints (handcrafted features or optical flow, etc.), and prior regularities (as-rigid-as-possible or embedded deformation, etc.). We propose an end-to-end model architecture that is <b>TR</b>ansformer <b>for</b> synchronous <b>T</b>racking and <b>R</b>econstruction of non-rigid dynamic target based on RGB-D images from a monocular camera, called TR4TR. We use a spatial-temporal combined 2D image encoder that directly encodes features from RGB-D sequence images, and a 3D point decoder to generate aligned sequential point cloud containing tracking and reconstruction results. The TR4TR model outperforms the baselines on the DeepDeform non-rigid dataset, and outperforms the state-of-the-art method by 8.82% on the deformation error evaluation metric. In addition, TR4TR is more robust when the target undergoes large inter-frame deformation. The code is available at https://github.com/xfliu1998/tr4tr-main.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"34 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144104762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID 基于协同细化的无监督VI-ReID语义对齐学习
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-05-19 DOI: 10.1007/s11263-025-02461-z
De Cheng, Lingfeng He, Nannan Wang, Dingwen Zhang, Xinbo Gao
{"title":"Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID","authors":"De Cheng, Lingfeng He, Nannan Wang, Dingwen Zhang, Xinbo Gao","doi":"10.1007/s11263-025-02461-z","DOIUrl":"https://doi.org/10.1007/s11263-025-02461-z","url":null,"abstract":"<p>Unsupervised visible-infrared person re-identification (USL-VI-ReID) seeks to match pedestrian images of the same individual across different modalities without human annotations for model learning. Previous methods unify pseudo-labels of cross-modality images through label association algorithms and then design contrastive learning framework for global feature learning. However, these methods overlook the cross-modality variations in feature representation and pseudo-label distributions brought by fine-grained patterns. This insight results in insufficient modality-shared learning when only global features are optimized. To address this issue, we propose a Semantic-Aligned Learning with Collaborative Refinement (SALCR) framework, which builds up optimization objective for specific fine-grained patterns emphasized by each modality, thereby achieving complementary alignment between the label distributions of different modalities. Specifically, we first introduce a Dual Association with Global Learning (DAGI) module to unify the pseudo-labels of cross-modality instances in a bi-directional manner. Afterward, a Fine-Grained Semantic-Aligned Learning (FGSAL) module is carried out to explore part-level semantic-aligned patterns emphasized by each modality from cross-modality instances. Optimization objective is then formulated based on the semantic-aligned features and their corresponding label space. To alleviate the side-effects arising from noisy pseudo-labels, we propose a Global-Part Collaborative Refinement (GPCR) module to mine reliable positive sample sets for the global and part features dynamically and optimize the inter-instance relationships. Extensive experiments demonstrate the effectiveness of the proposed method, which achieves superior performances to state-of-the-art methods. Our code is available at https://github.com/FranklinLingfeng/code-for-SALCR.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"31 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144097302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to Deblur Polarized Images 学习去模糊偏振图像
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-05-19 DOI: 10.1007/s11263-025-02459-7
Chu Zhou, Minggui Teng, Xinyu Zhou, Chao Xu, Imari Sato, Boxin Shi
{"title":"Learning to Deblur Polarized Images","authors":"Chu Zhou, Minggui Teng, Xinyu Zhou, Chao Xu, Imari Sato, Boxin Shi","doi":"10.1007/s11263-025-02459-7","DOIUrl":"https://doi.org/10.1007/s11263-025-02459-7","url":null,"abstract":"<p>A polarization camera can capture four linear polarized images with different polarizer angles in a single shot, which is useful in polarization-based vision applications since the degree of linear polarization (DoLP) and the angle of linear polarization (AoLP) can be directly computed from the captured polarized images. However, since the on-chip micro-polarizers block part of the light so that the sensor often requires a longer exposure time, the captured polarized images are prone to motion blur caused by camera shakes, leading to noticeable degradation in the computed DoLP and AoLP. Deblurring methods for conventional images often show degraded performance when handling the polarized images since they only focus on deblurring without considering the polarization constraints. In this paper, we propose a polarized image deblurring pipeline to solve the problem in a polarization-aware manner by adopting a divide-and-conquer strategy to explicitly decompose the problem into two less ill-posed sub-problems, and design a two-stage neural network to handle the two sub-problems respectively. Experimental results show that our method achieves state-of-the-art performance on both synthetic and real-world images, and can improve the performance of polarization-based vision applications such as image dehazing and reflection removal.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"76 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144088326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Closed-Form Formulae for Feature-Based Subpixel Alignment in Patch-Based Matching 基于补丁匹配中基于特征的亚像素对齐的广义封闭公式
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-05-19 DOI: 10.1007/s11263-025-02457-9
Laurent Valentin Jospin, Hamid Laga, Farid Boussaid, Mohammed Bennamoun
{"title":"Generalized Closed-Form Formulae for Feature-Based Subpixel Alignment in Patch-Based Matching","authors":"Laurent Valentin Jospin, Hamid Laga, Farid Boussaid, Mohammed Bennamoun","doi":"10.1007/s11263-025-02457-9","DOIUrl":"https://doi.org/10.1007/s11263-025-02457-9","url":null,"abstract":"<p>Patch-based matching is a technique meant to measure the disparity between pixels in a source and target image and is at the core of various methods in computer vision. When the subpixel disparity between the source and target images is required, the cost function or the target image has to be interpolated. While cost-based interpolation is easier to implement, multiple works have shown that image-based interpolation can increase the accuracy of the disparity estimate. In this paper we review closed-form formulae for subpixel disparity computation for one dimensional matching, e.g., rectified stereo matching, for the standard cost functions used in patch-based matching. We then propose new formulae to generalize to high-dimensional search spaces, which is necessary for unrectified stereo matching and optical flow. We also compare the image-based interpolation formulae with traditional cost-based formulae, and show that image-based interpolation brings a significant improvement over the cost-based interpolation methods for two dimensional search spaces, and small improvement in the case of one dimensional search spaces. The zero-mean normalized cross correlation cost function is found to be preferable for subpixel alignment. A new error model, based on very broad assumptions is outlined in the Supplementary Material to demonstrate why these image-based interpolation formulae outperform their cost-based counterparts and why the zero-mean normalized cross correlation function is preferable for subpixel alignement.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"121 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144088324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SimZSL: Zero-Shot Learning Beyond a Pre-defined Semantic Embedding Space 超越预定义语义嵌入空间的零射击学习
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-05-17 DOI: 10.1007/s11263-025-02422-6
Mina Ghadimi Atigh, Stephanie Nargang, Martin Keller-Ressel, Pascal Mettes
{"title":"SimZSL: Zero-Shot Learning Beyond a Pre-defined Semantic Embedding Space","authors":"Mina Ghadimi Atigh, Stephanie Nargang, Martin Keller-Ressel, Pascal Mettes","doi":"10.1007/s11263-025-02422-6","DOIUrl":"https://doi.org/10.1007/s11263-025-02422-6","url":null,"abstract":"<p>Zero-shot recognition is centered around learning representations to transfer knowledge from seen to unseen classes. Where foundational approaches perform the transfer with semantic embedding spaces, <i>e.g.,</i> from attributes or word vectors, the current state-of-the-art relies on prompting pre-trained vision-language models to obtain class embeddings. Whether zero-shot learning is performed with attributes, CLIP, or something else, current approaches <i>de facto</i> assume that there is a pre-defined embedding space in which seen and unseen classes can be positioned. Our work is concerned with real-world zero-shot settings where a pre-defined embedding space can no longer be assumed. This is natural in domains such as biology and medicine, where class names are not common English words, rendering vision-language models useless; or neuroscience, where class relations are only given with non-semantic human comparison scores. We find that there is one data structure enabling zero-shot learning in both standard and non-standard settings: a similarity matrix spanning the seen and unseen classes. We introduce four <i>similarity-based zero-shot learning</i> challenges, tackling open-ended scenarios such as learning with uncommon class names, learning from multiple partial sources, and learning with missing knowledge. As the first step for zero-shot learning beyond a pre-defined semantic embedding space, we propose <span>(kappa )</span>-MDS, a general approach that obtains a prototype for each class on any manifold from similarities alone, even when part of the similarities are missing. Our approach can be plugged into any standard, hyperspherical, or hyperbolic zero-shot learner. Experiments on existing datasets and the new benchmarks show the promise and challenges of similarity-based zero-shot learning.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"127 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144083184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HumanLiff: Layer-wise 3D Human Diffusion Model HumanLiff:分层3D人体扩散模型
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-05-16 DOI: 10.1007/s11263-025-02477-5
Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Haiyi Mei, Weiye Xiao, Lei Yang, Ziwei Liu
{"title":"HumanLiff: Layer-wise 3D Human Diffusion Model","authors":"Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Haiyi Mei, Weiye Xiao, Lei Yang, Ziwei Liu","doi":"10.1007/s11263-025-02477-5","DOIUrl":"https://doi.org/10.1007/s11263-025-02477-5","url":null,"abstract":"<p>3D human generation from 2D images has achieved remarkable progress through the synergistic utilization of neural rendering and generative models. Existing 3D human generative models mainly generate a clothed 3D human as an inseparable 3D model in a single pass, while rarely considering the layer-wise nature of a clothed human body, which often consists of the human body and various clothes such as underwear, outerwear, trousers, shoes, etc. In this work, we propose <b>HumanLiff</b>, the first layer-wise 3D human generative model with a unified diffusion process. Specifically, HumanLiff firstly generates minimal-clothed humans, represented by tri-plane features, in a canonical space, and then progressively generates clothes in a layer-wise manner. In this way, the 3D human generation is thus formulated as a sequence of diffusion-based 3D conditional generation. To reconstruct more fine-grained 3D humans with tri-plane representation, we propose a tri-plane shift operation that splits each tri-plane into three sub-planes and shifts these sub-planes to enable feature grid subdivision. To further enhance the controllability of 3D generation with 3D layered conditions, HumanLiff hierarchically fuses tri-plane features and 3D layered conditions to facilitate the 3D diffusion model learning. Extensive experiments on two layer-wise 3D human datasets, SynBody (synthetic) and TightCap (real-world), validate that HumanLiff significantly outperforms state-of-the-art methods in layer-wise 3D human generation. Our code and datasets are available at https://skhu101.github.io/HumanLiff.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"15 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144066060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion 基于多模态引导GAN反演的高保真图像修复
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-05-16 DOI: 10.1007/s11263-025-02448-w
Libo Zhang, Yongsheng Yu, Jiali Yao, Heng Fan
{"title":"High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion","authors":"Libo Zhang, Yongsheng Yu, Jiali Yao, Heng Fan","doi":"10.1007/s11263-025-02448-w","DOIUrl":"https://doi.org/10.1007/s11263-025-02448-w","url":null,"abstract":"<p>Generative Adversarial Network (GAN) inversion have demonstrated excellent performance in image inpainting that aims to restore lost or damaged image texture using its unmasked content. Previous GAN inversion-based methods usually utilize well-trained GAN models as effective priors to generate the realistic regions for missing holes. Despite excellence, they ignore a hard constraint that the unmasked regions in the input and the output should be the same, resulting in a gap between GAN inversion and image inpainting and thus degrading the performance. Besides, existing GAN inversion approaches often consider a single modality of the input image, neglecting other auxiliary cues in images for improvements. Addressing these problems, we propose a novel GAN inversion approach, dubbed <i>MMInvertFill</i>, for image inpainting. MMInvertFill contains primarily a multimodal guided encoder with a pre-modulation and a GAN generator with <span>( mathcal {F} &amp; mathcal {W}^+)</span> latent space. Specifically, the multimodal encoder aims to enhance the multi-scale structures with additional semantic segmentation edge texture modalities through a gated mask-aware attention module. Afterwards, a pre-modulation is presented to encode these structures into style vectors. To mitigate issues of conspicuous color discrepancy and semantic inconsistency, we introduce the <span>( mathcal {F} &amp; mathcal {W}^+)</span> latent space to bridge the gap between GAN inversion and image inpainting. Furthermore, in order to reconstruct faithful and photorealistic images, we devise a simple yet effective Soft-update Mean Latent module to capture more diversified in-domain patterns for generating high-fidelity textures for massive corruptions. In our extensive experiments on six challenging datasets, including CelebA-HQ, Places2, OST, CityScapes, MetFaces and Scenery, we show that our MMInvertFill qualitatively and quantitatively outperforms other state-of-the-arts and it supports the completion of out-of-domain images effectively. Our project webpage including code and results will be available at https://yeates.github.io/mm-invertfill.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"14 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144067126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Defending Against Adversarial Examples Via Modeling Adversarial Noise 通过对抗性噪声建模来防御对抗性示例
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-05-14 DOI: 10.1007/s11263-025-02467-7
Dawei Zhou, Nannan Wang, Bo Han, Tongliang Liu, Xinbo Gao
{"title":"Defending Against Adversarial Examples Via Modeling Adversarial Noise","authors":"Dawei Zhou, Nannan Wang, Bo Han, Tongliang Liu, Xinbo Gao","doi":"10.1007/s11263-025-02467-7","DOIUrl":"https://doi.org/10.1007/s11263-025-02467-7","url":null,"abstract":"<p>Adversarial examples have become a major threat to the reliable application of deep learning models. Meanwhile, this issue promotes the development of adversarial defenses. Adversarial noise contains well-generalizing and misleading features, which can manipulate predicted labels to be flipped maliciously. Motivated by this, we study <i>modeling adversarial noise</i> for defending against adversarial examples by learning the transition relationship between adversarial labels (<i>i.e.</i>, flipped labels caused by adversarial noise) and natural labels (<i>i.e.</i>, real labels of natural samples). In this work, we propose an adversarial defense method from the perspective of modeling adversarial noise. Specifically, we construct an instance-dependent label transition matrix to represent the label transition relationship for explicitly modeling adversarial noise. The label transition matrix is obtained from the input sample by leveraging a label transition network. By exploiting the label transition matrix, we can infer the natural label from the adversarial label and thus correct wrong predictions misled by adversarial noise. Additionally, to enhance the robustness of the label transition network, we design an adversarial robustness constraint at the transition matrix level. Experimental results demonstrate that our method effectively improves the robust accuracy against multiple attacks and exhibits great performance in detecting adversarial input samples.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"28 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143979610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信