Signal Processing-Image Communication最新文献_第7页

Advanced transformer for high-noise image denoising: Enhanced attention and detail preservation 用于高噪声图像去噪的先进变压器：增强关注和细节保存

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-02-26 DOI: 10.1016/j.image.2025.117286

Jie Zhang , Wenxiao Huang , Miaoxin Lu , Fengxian Wang , Mingdong Zhao , Yinhua Li

{"title":"Advanced transformer for high-noise image denoising: Enhanced attention and detail preservation","authors":"Jie Zhang , Wenxiao Huang , Miaoxin Lu , Fengxian Wang , Mingdong Zhao , Yinhua Li","doi":"10.1016/j.image.2025.117286","DOIUrl":"10.1016/j.image.2025.117286","url":null,"abstract":"<div><div>In image denoising, the transformer model effectively captures global dependencies within an image due to its self-attention mechanism. This capability enhances the understanding of the overall structure and details of the image during the denoising process. However, the computational complexity of global self-attention increases quadratically with higher spatial resolutions, making it unsuitable for the real-time denoising of high-resolution and high-noise images. And, the use of local windows alone neglects the long-range pixel correlations. Furthermore, the self-attention mechanism applies a global weighting to the pixels of the input image, which can lead to the smoothing or loss of fine details. To enrich structural information and alleviate the computational complexity associated with global self-attention, we propose an edge-enhanced windowed multi-head self-attention mechanism (EWMSA). This mechanism combines edge enhancement with windowed self-attention to reduce computational demands while allowing edge features to better preserve detail and texture information. To mitigate the effects of ineffective features with low weights, we introduce a feed-forward network with a gate control strategy (LGFN). This network adjusts pixel weights to prioritize attention on effective pixels, thereby enhancing their prominence. Furthermore, to compensate for the limitations of window-based self-attention in global pixel utilization, we propose a deformable convolution block (DFCB). This block improves the interaction of contextual information and allows for better adaptation to texture variations within the image. Extensive experiments demonstrate that the proposed ATHID is competitive with other state-of-the-art denoising methods when applied to real-world noise and various synthetic noise levels, effectively addressing the challenges of high-noise image denoising. The code and models are publicly available at <span><span>https://github.com/zzuli407/ATHID</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"136 ","pages":"Article 117286"},"PeriodicalIF":3.4,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143550934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visually multimodal depression assessment based on key questions with weighted multi-task learning 基于加权多任务学习关键问题的视觉多模态抑郁评估

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-02-18 DOI: 10.1016/j.image.2025.117279

Peng Wang , Miaomiao Cao , Xianlin Zhu , Suhong Wang , Rongrong Ni , Changchun Yang , Biao Yang

{"title":"Visually multimodal depression assessment based on key questions with weighted multi-task learning","authors":"Peng Wang , Miaomiao Cao , Xianlin Zhu , Suhong Wang , Rongrong Ni , Changchun Yang , Biao Yang","doi":"10.1016/j.image.2025.117279","DOIUrl":"10.1016/j.image.2025.117279","url":null,"abstract":"<div><div>In recent years, depression has received attention due to its high prevalence and high risk of suicide. In contrast, the increased pressure on health care and the shortage of mental health professionals have led to the failure to detect and intervene in depression promptly. To solve the above problems, we propose a visual multi-modal fusion network for depression assessment based on weighted multi-task learning (WMTL). First, the visual cues of different modalities are collected from the subjects when they answer key questions in the simulated interview to mitigate redundancy. Afterward, spatial attention-based feature embedding modules are proposed to extract depression-aware features from different visual cues. Finally, a hierarchical weighted attention fusion (HAF) module is presented to fuse the depression-aware features from different modalities and facilitate depression assessment. Comprehensive evaluations are conducted on the benchmarking DAIC-WOZ. Experimental results show that the proposed method performs well in assessing depression, with an average accuracy of 76.96% for ten questions and an F1 score of 0.85. The high performance also indicates a strong correlation between key questions in the interview and depression levels.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117279"},"PeriodicalIF":3.4,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploiting rank-based filter pruning for real-time UAV tracking 利用基于秩的滤波剪枝实现无人机实时跟踪

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-02-18 DOI: 10.1016/j.image.2025.117278

Xucheng Wang , Dan Zeng , Qijun Zhao , Shuiwang Li

{"title":"Exploiting rank-based filter pruning for real-time UAV tracking","authors":"Xucheng Wang , Dan Zeng , Qijun Zhao , Shuiwang Li","doi":"10.1016/j.image.2025.117278","DOIUrl":"10.1016/j.image.2025.117278","url":null,"abstract":"<div><div>UAV tracking is an emerging task and has wide potential applications in such as agriculture, navigation, entertainment and public security. However, the limitations of computing resources, battery capacity, and maximum load of UAV hinder the deployment of DL-based tracking algorithms on UAV. In contrast to deep learning trackers, discriminative correlation filters (DCF)-based trackers stand out in the UAV tracking community because of their high efficiency. However, their precision is usually much lower than trackers based on deep learning. Model compression is a promising way to reduce the disparity (i.e., efficiency, precision) between DCF- and deep learning- based trackers, which has not caught much attention in the UAV tracking community. In this paper, We propose the P-SiamFC++ tracker, which is the first to use rank-based filter pruning to compress the SiamFC++ model, achieving a remarkable balance between efficiency and precision. Our method is general and could inspire additional research into UAV tracking with model compression in the future. Extensive experiments on four UAV benchmarks, including UAV123@10fps, DTB70, UAVDT and Vistrone2018, show that P-SiamFC++ tracker significantly outperforms state-of-the-art UAV tracking methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117278"},"PeriodicalIF":3.4,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143464855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rethinking erasing strategy on weakly supervised object localization 再思考弱监督对象定位的擦除策略

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-02-18 DOI: 10.1016/j.image.2025.117280

Yuming Fan , Shikui Wei , Chuangchuang Tan , Xiaotong Chen , Dongming Yang , Yao Zhao

{"title":"Rethinking erasing strategy on weakly supervised object localization","authors":"Yuming Fan , Shikui Wei , Chuangchuang Tan , Xiaotong Chen , Dongming Yang , Yao Zhao","doi":"10.1016/j.image.2025.117280","DOIUrl":"10.1016/j.image.2025.117280","url":null,"abstract":"<div><div>Weakly supervised object localization (WSOL) is a challenging task that aims to locate object regions in images using image-level labels as supervision. Early research utilized erasing strategy to expand the localization regions. However, those methods usually adopt a fix threshold resulting in over- or under-fitting of the object region. Additionally, recent pseudo-label paradigm decouples the classification and localization tasks, causing confusion between foreground and background regions. In this paper, we propose the Soft-Erasing (SoE) method for Weakly Supervised Object Localization (WSOL). It includes two key modules: the Adaptive Erasing (AE) and Flip Erasing (FE). The AE module dynamically adjusts the erasing threshold using the object’s structural information, while the noise information module ensures the classifier focuses on the foreground region. The FE module effectively decouples object and background information by using normalization and inversion techniques. Additionally, we introduce activation loss and reverse loss to strengthen semantic consistency in foreground regions. Experiments on public datasets demonstrate that our SoE framework significantly improves localization accuracy, achieving 70.86% on GT-Known Loc for ILSVRC and 95.84% for CUB-200-2011.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117280"},"PeriodicalIF":3.4,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DA-Net: Deep attention network for biomedical image segmentation DA-Net：用于生物医学图像分割的深度关注网络

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-02-13 DOI: 10.1016/j.image.2025.117283

Yingyan Gu, Yan Wang, Hua Ye, Xin Shu

{"title":"DA-Net: Deep attention network for biomedical image segmentation","authors":"Yingyan Gu, Yan Wang, Hua Ye, Xin Shu","doi":"10.1016/j.image.2025.117283","DOIUrl":"10.1016/j.image.2025.117283","url":null,"abstract":"<div><div>Deep learning-based image segmentation techniques are of great significance to biomedical image analysis and clinical disease diagnosis, among which U-Net is one of the classic biomedical image segmentation algorithms and is widely used in the field of biomedicine. In this paper, we propose an improved triplet attention module and embed it into the U-Net framework to form a novel deep attention network, called DA-Net, for biomedical image segmentation. Specifically, an additional layer is stacked into the original U-Net, resulting in a six-layer U-shaped network. Then, the double convolution module of the U-Net is replaced with a composite block which consists of the improved triplet attention module and the residual concatenate block, to obtain abundant valuable features effectively. We redesign the network structure to increase its width and depth and train our model with the pixel position aware loss, realizing the synchronous increase of the mean IoU value and average Dice index. Extensive experiments have been carried out on two publicly available biomedical datasets, including the 2018 Data Science Bowl (DSB) and the international skin imaging collaboration (ISIC) 2018 Challenge, and a self-built fetal cerebellar ultrasound dataset from Affiliated Hospital of Jiangsu University, named JSUAH<img>Cerebellum. The mIoU and mDice of DA-Net can reach 87.45 % and 92.98 % on the JSUAH<img>Cerebellum, 87.36 % and 91.37 % on the 2018 Data Science Bowl, and 86.75 % and 91.34 % on the ISIC-2018 Challenge, respectively. Experimental results demonstrate that our DA-Net achieves promising performance in terms of segmentation robustness and generalization ability.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117283"},"PeriodicalIF":3.4,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

No-reference image quality assessment based on improved vision transformer and transfer learning 基于改进视觉变换和迁移学习的无参考图像质量评估

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-02-11 DOI: 10.1016/j.image.2025.117282

Bo Zhang , Luoxi Wang , Cheng Zhang , Ran Zhao , Jinlu Sun

{"title":"No-reference image quality assessment based on improved vision transformer and transfer learning","authors":"Bo Zhang , Luoxi Wang , Cheng Zhang , Ran Zhao , Jinlu Sun","doi":"10.1016/j.image.2025.117282","DOIUrl":"10.1016/j.image.2025.117282","url":null,"abstract":"<div><div>To improve the accuracy and generalization performance of the existing no-reference image quality assessment models on small datasets, a no-reference image quality assessment model based on an improved vision transformer model and transfer learning is proposed. Firstly, ResNet is employed as a feature extraction network to obtain basic perceptual features from the input images, and a Convolutional Block Attention Module is introduced to further improve the network's feature extraction capabilities. Secondly, the Transformer Encoder is utilized to regress multi-layer features, improving the network's ability to capture global image information and predict scores. Lastly, to overcome the performance limitations of the Transformer model on small datasets, a transfer learning method is used to solve the dilemma of the relatively small capacity of the databases for image quality assessment. The model is trained and tested on three small-scale datasets and compared with seven mainstream algorithms. Performance is analyzed across three dimensions using statistical significance tests. The results show that, while the model does not perform best in distinguishing between similar and significantly different pairs, it still demonstrates competitive capabilities. Additionally, it performs exceptionally well in assessing quality differences and evaluating Area Under Curve, highlighting its strong potential for practical applications.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117282"},"PeriodicalIF":3.4,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143427755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating the resize parameter in end-to-end learned image compression 端到端学习图像压缩中调整大小参数的估计

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-02-06 DOI: 10.1016/j.image.2025.117277

Li-Heng Chen , Christos G. Bampis , Zhi Li , Lukáš Krasula , Alan C. Bovik

{"title":"Estimating the resize parameter in end-to-end learned image compression","authors":"Li-Heng Chen , Christos G. Bampis , Zhi Li , Lukáš Krasula , Alan C. Bovik","doi":"10.1016/j.image.2025.117277","DOIUrl":"10.1016/j.image.2025.117277","url":null,"abstract":"<div><div>We describe a search-free resizing framework that can further improve the rate–distortion tradeoff of recent learned image compression models. Our approach is simple: compose a pair of differentiable downsampling/upsampling layers that sandwich a neural compression model. To determine resize factors for different inputs, we utilize another neural network jointly trained with the compression model, with the end goal of minimizing the rate–distortion objective. Our results suggest that “compression friendly” downsampled representations can be quickly determined during encoding by using an auxiliary network and differentiable image warping. By conducting extensive experimental tests on existing deep image compression models, we show results that our new resizing parameter estimation framework can provide Bjøntegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines. We also carried out a subjective quality study, the results of which show that our new approach yields favorable compressed images. To facilitate reproducible research in this direction, the implementation used in this paper is being made freely available online at: <span><span>https://github.com/treammm/ResizeCompression</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117277"},"PeriodicalIF":3.4,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143387866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Full reference point cloud quality assessment using support vector regression 使用支持向量回归的完整参考点云质量评估

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-02-01 DOI: 10.1016/j.image.2024.117239

Ryosuke Watanabe , Shashank N. Sridhara , Haoran Hong , Eduardo Pavez , Keisuke Nonaka , Tatsuya Kobayashi , Antonio Ortega

{"title":"Full reference point cloud quality assessment using support vector regression","authors":"Ryosuke Watanabe , Shashank N. Sridhara , Haoran Hong , Eduardo Pavez , Keisuke Nonaka , Tatsuya Kobayashi , Antonio Ortega","doi":"10.1016/j.image.2024.117239","DOIUrl":"10.1016/j.image.2024.117239","url":null,"abstract":"<div><div>Point clouds are a general format for representing realistic 3D objects in diverse 3D applications. Since point clouds have large data sizes, developing efficient point cloud compression methods is crucial. However, excessive compression leads to various distortions, which deteriorates the point cloud quality perceived by end users. Thus, establishing reliable point cloud quality assessment (PCQA) methods is essential as a benchmark to develop efficient compression methods. This paper presents an accurate full-reference point cloud quality assessment (FR-PCQA) method called full-reference quality assessment using support vector regression (FRSVR) for various types of degradations such as compression distortion, Gaussian noise, and down-sampling. The proposed method demonstrates accurate PCQA by integrating five FR-based metrics covering various types of errors (e.g., considering geometric distortion, color distortion, and point count) using support vector regression (SVR). Moreover, the proposed method achieves a superior trade-off between accuracy and calculation speed because it includes only the calculation of these five simple metrics and SVR, which can perform fast prediction. Experimental results with three types of open datasets show that the proposed method is more accurate than conventional FR-PCQA methods. In addition, the proposed method is faster than state-of-the-art methods that utilize complicated features such as curvature and multi-scale features. Thus, the proposed method provides excellent performance in terms of the accuracy of PCQA and processing speed. Our method is available from <span><span>https://github.com/STAC-USC/FRSVR-PCQA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"131 ","pages":"Article 117239"},"PeriodicalIF":3.4,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian framework based additive intrinsic components optimization deformable model for image segmentation 基于贝叶斯框架的可加性内禀分量优化变形图像分割模型

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-02-01 DOI: 10.1016/j.image.2024.117238

Yanjun Ren , Dong Li , Liming Tang

{"title":"Bayesian framework based additive intrinsic components optimization deformable model for image segmentation","authors":"Yanjun Ren , Dong Li , Liming Tang","doi":"10.1016/j.image.2024.117238","DOIUrl":"10.1016/j.image.2024.117238","url":null,"abstract":"<div><div>The effectiveness of image segmentation can be greatly compromised by factors like inhomogeneity, low-resolution, and noise. Aiming at these challenges, we propose a new segmentation-oriented additive decomposition model for images. Firstly, the model assumes that the to be segmented image is the sum of three components: true image, bias field, and noise. Secondly, we pursue the true image in the image domain base on Bayesian framework, and establish the active contour model. In this model, the conditional probability is assumed to follow a local Gaussian distribution. The prior probability is constructed jointly by the following three assumptions. Specifically, we describe the true image as a Markov field defined as the Gibbs energy function. The bias field <span><math><mi>b</mi></math></span> is modeled as a Gaussian distribution with mean 0 and variance <span><math><msub><mrow><mi>σ</mi></mrow><mrow><mi>i</mi></mrow></msub></math></span>. In addition, as an alternative, we employ regularization to the evolution curve by means of heat kernel convolution function. Finally, the proposed multi-objective optimization model is solved numerically using variational and gradient descent algorithms. The effectiveness of the proposed model has been validated through experiments conducted on various images, including natural, degraded text document, and others. The results show that compared to the classical active contour model, our model improve across four evaluation metrics. Among these, the smallest increase is in the P value, at 5%, while the most significant improvement is in the JSC value, reaching 14%.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"131 ","pages":"Article 117238"},"PeriodicalIF":3.4,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lp-norm distortion-efficient adversarial attack 低范数扭曲效率对抗性攻击

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-02-01 DOI: 10.1016/j.image.2024.117241

Chao Zhou , Yuan-Gen Wang , Zi-Jia Wang , Xiangui Kang

{"title":"Lp-norm distortion-efficient adversarial attack","authors":"Chao Zhou , Yuan-Gen Wang , Zi-Jia Wang , Xiangui Kang","doi":"10.1016/j.image.2024.117241","DOIUrl":"10.1016/j.image.2024.117241","url":null,"abstract":"<div><div>Adversarial examples have shown a powerful ability to make a well-trained model misclassified. Current mainstream adversarial attack methods only consider one of the distortions among <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>-norm, <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-norm, and <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>∞</mi></mrow></msub></math></span>-norm. <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>-norm based methods cause large modification on a single pixel, resulting in naked-eye visible detection, while <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-norm and <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>∞</mi></mrow></msub></math></span>-norm based methods suffer from weak robustness against adversarial defense since they always diffuse tiny perturbations to all pixels. A more realistic adversarial perturbation should be sparse and imperceptible. In this paper, we propose a novel <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>p</mi></mrow></msub></math></span>-norm distortion-efficient adversarial attack, which not only owns the least <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-norm loss but also significantly reduces the <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>-norm distortion. To this aim, we design a new optimization scheme, which first optimizes an initial adversarial perturbation under <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-norm constraint, and then constructs a dimension unimportance matrix for the initial perturbation. Such a dimension unimportance matrix can indicate the adversarial unimportance of each dimension of the initial perturbation. Furthermore, we introduce a new concept of adversarial threshold for the dimension unimportance matrix. The dimensions of the initial perturbation whose unimportance is higher than the threshold will be all set to zero, greatly decreasing the <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>-norm distortion. Experimental results on three benchmark datasets show that under the same query budget, the adversarial examples generated by our method have lower <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>-norm and <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-norm distortion than the state-of-the-art. Especially for the MNIST dataset, our attack reduces 8.1% <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-norm distortion meanwhile remaining 47% pixels unattacked. This demonstrates the superiority of the proposed method over its competitors in terms of adversarial robustness and visual imperceptibility. The code is avail","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"131 ","pages":"Article 117241"},"PeriodicalIF":3.4,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0