Image and Vision Computing最新文献_第6页

Automated dual CNN-based feature extraction with SMOTE for imbalanced diabetic retinopathy classification 基于SMOTE的自动双cnn特征提取用于不平衡糖尿病视网膜病变分类

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-12 DOI: 10.1016/j.imavis.2025.105537

Danyal Badar Soomro , Wang ChengLiang , Mahmood Ashraf , Dina Abdulaziz AlHammadi , Shtwai Alsubai , Carlo Medaglia , Nisreen Innab , Muhammad Umer

{"title":"Automated dual CNN-based feature extraction with SMOTE for imbalanced diabetic retinopathy classification","authors":"Danyal Badar Soomro , Wang ChengLiang , Mahmood Ashraf , Dina Abdulaziz AlHammadi , Shtwai Alsubai , Carlo Medaglia , Nisreen Innab , Muhammad Umer","doi":"10.1016/j.imavis.2025.105537","DOIUrl":"10.1016/j.imavis.2025.105537","url":null,"abstract":"<div><div>The primary cause of Diabetic Retinopathy (DR) is high blood sugar due to long-term diabetes. Early and correct diagnosis of the DR is essential for timely and effective treatment. Despite high performance of recently developed models, there is still a need to overcome the problem of class imbalance issues and feature extraction to achieve accurate results. To resolve this problem, we have presented an automated model combining the customized ResNet-50 and EfficientNetB0 for detecting and classifying DR in fundus images. The proposed model addresses class imbalance using data augmentation and Synthetic Minority Oversampling Technique (SMOTE) for oversampling the training data and enhances the feature extraction process through fine-tuned ResNet50 and EfficientNetB0 models with ReLU activations and global average pooling. Combining extracted features and then passing it to four different classifiers effectively captures both local and global spatial features, thereby improving classification accuracy for diabetic retinopathy. For Experiment, The APTOS 2019 Dataset is used, and it contains of 3662 high-quality fundus images. The performance of the proposed model is assessed using several metrics, and the findings are compared with contemporary methods for diabetic retinopathy detection. The suggested methodology demonstrates substantial enhancement in diabetic retinopathy diagnosis for fundus pictures. The proposed automated model attained an accuracy of 98.5% for binary classification and 92.73% for multiclass classification.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105537"},"PeriodicalIF":4.2,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MFC-Net: Amodal instance segmentation with multi-path fusion and context-awareness MFC-Net：具有多路径融合和上下文感知的模态实例分割

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-12 DOI: 10.1016/j.imavis.2025.105539

Yunfei Yang , Hongwei Deng , Yichun Wu

{"title":"MFC-Net: Amodal instance segmentation with multi-path fusion and context-awareness","authors":"Yunfei Yang , Hongwei Deng , Yichun Wu","doi":"10.1016/j.imavis.2025.105539","DOIUrl":"10.1016/j.imavis.2025.105539","url":null,"abstract":"<div><div>Amodal instance segmentation refers to sensing the entire instance in an image, thereby segmenting the visible parts of an object and the regions that may be masked. However, existing amodal instance segmentation methods predict rough mask edges and perform poorly in segmenting objects with significant size differences. In addition, the occlusion environment greatly limits the performance of the model. To address the above problems, this work proposes an amodal instance segmentation method called MFC-Net to accurately segment objects in an image. For the rough prediction of mask edges, the model introduces the multi-path transformer structure to obtain finer object semantic features and boundary information, which improves the accuracy of edge region segmentation. For the problem of poor segmentation of object instances with significant size differences, we design an adaptive feature fusion module AFF, which dynamically captures the scale changes related to object size and fuses the multi-scale semantic feature information, so that the model obtains a receptive field adapted to the object size. To address the poor performance of segmentation in the occlusion environment, we designed the context-aware mask segmentation module CMS in the prediction module to make a preliminary prediction of the object’s amodal region. The module enhances the amodal perception of the model by modeling the long-range dependencies of the objects and capturing the contextual information of the occluded part of the object. Compared with the state-of-the-art methods, the MFC-Net proposed in this paper achieves a mAP of 73.3% on the D2SA dataset and 33.9% and 36.9% on the KINS and COCOA-cls datasets, respectively. Moreover, MFC-Net can produce complete and detailed amodal masks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105539"},"PeriodicalIF":4.2,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143828331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid Attention Transformers with fast Fourier convolution for light field image super-resolution 用于光场图像超分辨率的快速傅立叶卷积混合注意力转换器

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-10 DOI: 10.1016/j.imavis.2025.105542

Zhicheng Ma , Yuduo Guo , Zhaoxiang Liu , Shiguo Lian , Sen Wan

{"title":"Hybrid Attention Transformers with fast Fourier convolution for light field image super-resolution","authors":"Zhicheng Ma , Yuduo Guo , Zhaoxiang Liu , Shiguo Lian , Sen Wan","doi":"10.1016/j.imavis.2025.105542","DOIUrl":"10.1016/j.imavis.2025.105542","url":null,"abstract":"<div><div>The limited spatial resolution of light field (LF) cameras has hindered their widespread adoption, emphasizing the critical need for superresolution techniques to improve their practical use. Transformer-based methods, such as LF-DET, have shown potential in enhancing light field spatial super-resolution (LF-SR). However, LF-DET, which employs a spatial-angular separable transformer encoder with sub-sampling spatial and multiscale angular modeling for global context interaction, struggles to effectively capture global context in early layers and local details. In this work, we introduce LF-HATF, a novel network that builds on the LF-DET framework and incorporates Fast Fourier Convolution (FFC) and Hybrid Attention Transformers (HATs) to address these limitations. This integration enables LF-HATF to better capture both global and local information, significantly improving the restoration of edge details and textures, and providing a more comprehensive understanding of complex scenes. Additionally, we propose the Light Field Charbonnier loss function, designed to balance differential distributions across various LF views. This function minimizes errors both within the same perspective and across different views, further enhancing the model’s performance. Our evaluation on five public LF datasets demonstrates that LF-HATF outperforms existing methods, representing a significant advancement in LF-SR technology. This progress pushes the field forward and opens new avenues for research in light field imaging, unlocking the full potential of light field cameras.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105542"},"PeriodicalIF":4.2,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143828330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Controlling vision-language model for enhancing image restoration 控制视觉语言模型增强图像复原

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-08 DOI: 10.1016/j.imavis.2025.105538

Mingwen Shao, Weihan Liu, Qiwang Li, Lingzhuang Meng, Yecong Wan

{"title":"Controlling vision-language model for enhancing image restoration","authors":"Mingwen Shao, Weihan Liu, Qiwang Li, Lingzhuang Meng, Yecong Wan","doi":"10.1016/j.imavis.2025.105538","DOIUrl":"10.1016/j.imavis.2025.105538","url":null,"abstract":"<div><div>Restoring low-quality images to their original high-quality state remains a significant challenge due to inherent uncertainties, particularly in blind image restoration scenarios where the nature of degradation is unknown. Despite recent advances, many restoration techniques still grapple with robustness and adaptability across diverse degradation conditions. In this paper, we introduce an approach to augment the restoration model by exploiting the robust prior features of CLIP, a large-scale vision-language model, to enhance its proficiency in handling a broader spectrum of degradation tasks. We integrate the robust priors from CLIP into the pre-trained image restoration model via cross-attention mechanisms, and we design a Prior Adapter to modulate these features, thereby enhancing the model’s restoration performance. Additionally, we introduce an innovative prompt learning framework that harnesses CLIP’s multimodal alignment capabilities to fine-tune pre-trained restoration models. Furthermore, we utilize CLIP’s contrastive loss to ensure that the restored images align more closely with the prompts of clean images in CLIP’s latent space, thereby improving the quality of the restoration. Through comprehensive experiments, we demonstrate the effectiveness and robustness of our method, showcasing its superior adaptability to a wide array of degradation tasks. Our findings emphasize the potential of integrating vision-language models such as CLIP to advance the cutting-edge in image restoration.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105538"},"PeriodicalIF":4.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wave-based cross-phase representation for weakly supervised classification 弱监督分类的基于波的交叉相位表示

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-05 DOI: 10.1016/j.imavis.2025.105527

Heng Zhou , Ping Zhong

{"title":"Wave-based cross-phase representation for weakly supervised classification","authors":"Heng Zhou , Ping Zhong","doi":"10.1016/j.imavis.2025.105527","DOIUrl":"10.1016/j.imavis.2025.105527","url":null,"abstract":"<div><div>Weakly Supervised Learning (WSL) aims to improve model robustness and manage label uncertainty, but current methods struggle to handle various weak label sources, such as incomplete and noisy labels. Additionally, these methods struggle with a lack of adaptability from reliance on prior knowledge and the complexity of managing data-label dependencies. To address these problems, we propose a wave-based cross-phase network (WCPN) to enhance adaptability for incomplete and noisy labels. Specifically, we expand wave representations and design a cross-phase token mixing (CPTM) module to refine feature relationships and integrate strategies for various weak labels. The proposed CPFE algorithm in the CPTM optimizes feature relationships by using self-interference and mutual-interference to process phase information between feature tokens, thus enhancing semantic consistency and discriminative ability. Furthermore, by employing a data-driven tri-branch structure and maximizing mutual information between features and labels, WCPN effectively overcomes the inflexibility caused by reliance on prior knowledge and complex data-label dependencies. In this way, WCPN leverages wave representations to enhance feature interactions, capture data complexity and diversity, and improve feature compactness for specific categories. Experimental results demonstrate that WCPN excels across various supervision levels and consistently outperforms existing advanced methods. It effectively handles noisy and incomplete labels, showing remarkable adaptability and enhanced feature understanding.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105527"},"PeriodicalIF":4.2,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143785590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual region mutual enhancement network for camouflaged object detection 伪装目标检测的双区域互增强网络

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-05 DOI: 10.1016/j.imavis.2025.105526

Chao Yin, Xiaoqiang Li

{"title":"Dual region mutual enhancement network for camouflaged object detection","authors":"Chao Yin, Xiaoqiang Li","doi":"10.1016/j.imavis.2025.105526","DOIUrl":"10.1016/j.imavis.2025.105526","url":null,"abstract":"<div><div>Camouflaged Object Detection (COD) is a promising yet challenging task that aims to segment objects hidden in intricate surroundings. Current methods often struggle with identifying background regions that resemble camouflaged objects, posing a significant challenge. To mitigate this issue, we propose a novel Dual Region Mutual Enhancement Network (DRMENet), which separately extracts camouflaged object and background region features and these branches mutually assist each other to refine their respective region features. Specifically, in the foreground segmentation branch, we utilize the Background-assisted Foreground Region Enhancement (BFRE) subnetwork to enhance camouflaged object region features with background information. BFRE subnetwork consists of two parts: the Background-subtracted Foreground Refinement (BFR) module and the Scale-wise Feature Capturing (SFC) module, where the former obtains corresponding camouflaged object region features through cross-layer refinement with the assistance of background region features, and the latter captures scale-wise features and outputs a side output for region prediction result. Additionally, considering the noise present in low-level visual features, we introduce the Semantic-Guided Refinement (SGR) module, which progressively refines visual features based on enhanced semantic features. Experiments on challenging datasets show DRMENet’s superiority over the existing state-of-the-art methods. The source codes will be available at <span><span>https://github.com/ycyinchao/DRMENet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105526"},"PeriodicalIF":4.2,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143808625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepNet: Protection of deepfake images with aid of deep learning networks DeepNet：借助深度学习网络保护deepfake图像

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-02 DOI: 10.1016/j.imavis.2025.105540

Divyanshu Awasthi , Priyank Khare , Vinay Kumar Srivastava , Amit Kumar Singh , Brij B. Gupta

{"title":"DeepNet: Protection of deepfake images with aid of deep learning networks","authors":"Divyanshu Awasthi , Priyank Khare , Vinay Kumar Srivastava , Amit Kumar Singh , Brij B. Gupta","doi":"10.1016/j.imavis.2025.105540","DOIUrl":"10.1016/j.imavis.2025.105540","url":null,"abstract":"<div><div>In the present information age, multimedia security has become a challenging task. Especially increased usage of images as multimedia data has been a key aspect in this digital transmission era. Deep fake detection of images is a real-time problem which needs to be focused. To resolve this challenge, a novel deep fake detection algorithm is proposed in this article. The presented research uses the Viola-Jones detection algorithm for efficient deep fake image detection. To protect the integrity of these images, the multiresolution domain approach is effectively utilized with redundant discrete wavelet transform (RDWT) and multiresolution singular value decomposition (MSVD). Discrete cosine transform (DCT) is applied for the extraction of frequency components. An adaptive neuro-fuzzy inference system (ANFIS)-based optimization is applied to attain the optimum weighing factor (WF). This WF exhibits a better trade-off among attributes of watermarking. Furthermore, authentication is successfully implemented with the aid of various deep learning models such as SqueezeNet, EfficientNet-B0, ResNet-50 and InceptionV3. This implementation explores the various aspects related to the ownership assertion. Analysis of comprehensive simulation results depicts the effectiveness of the proposed technique over different prevailing techniques. With the development of the proposed technique, deep fake image detection can easily be realized and safeguards the images. The average percentage improvement in the imperceptibility of the proposed technique is 52.14% and for robustness is 7.51%.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105540"},"PeriodicalIF":4.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards on-device continual learning with Binary Neural Networks in industrial scenarios 在工业场景中使用二元神经网络实现设备上的持续学习

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-01 DOI: 10.1016/j.imavis.2025.105524

Lorenzo Vorabbi , Angelo Carraggi , Davide Maltoni , Guido Borghi , Stefano Santi

{"title":"Towards on-device continual learning with Binary Neural Networks in industrial scenarios","authors":"Lorenzo Vorabbi , Angelo Carraggi , Davide Maltoni , Guido Borghi , Stefano Santi","doi":"10.1016/j.imavis.2025.105524","DOIUrl":"10.1016/j.imavis.2025.105524","url":null,"abstract":"<div><div>This paper addresses the challenges of deploying deep learning models, specifically Binary Neural Networks (BNNs), on resource-constrained embedded devices within the Internet of Things context. As deep learning continues to gain traction in IoT applications, the need for efficient models that can learn continuously from incremental data streams without requiring extensive computational resources has become more pressing. We propose a solution that integrates Continual Learning with BNNs, utilizing replay memory to prevent catastrophic forgetting. Our method focuses on quantized neural networks, introducing the quantization also for the backpropagation step, significantly reducing memory and computational requirements. Furthermore, we enhance the replay memory mechanism by storing intermediate feature maps (<em>i.e.</em> latent replay) with 1-bit precision instead of raw data, enabling efficient memory usage. In addition to well-known benchmarks, we introduce the DL-Hazmat dataset, which consists of over 140k high-resolution grayscale images of 64 hazardous material symbols. Experimental results show a significant improvement in model accuracy and a substantial reduction in memory requirements, demonstrating the effectiveness of our method in enabling deep learning applications on embedded devices in real-world scenarios. Our work expands the application of Continual Learning and BNNs for efficient on-device training, offering a promising solution for IoT and other resource-constrained environments.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105524"},"PeriodicalIF":4.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

I3Net: Intensive information interaction network for RGB-T salient object detection I3Net: RGB-T显著目标检测的密集信息交互网络

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-01 DOI: 10.1016/j.imavis.2025.105525

Jia Hou , Hongfa Wen , Shuai Wang , Chenggang Yan

{"title":"I3Net: Intensive information interaction network for RGB-T salient object detection","authors":"Jia Hou , Hongfa Wen , Shuai Wang , Chenggang Yan","doi":"10.1016/j.imavis.2025.105525","DOIUrl":"10.1016/j.imavis.2025.105525","url":null,"abstract":"<div><div>Multi-modality salient object detection (SOD) is receiving more and more attention in recent years. Infrared thermal images can provide useful information in extreme situations, such as low illumination and cluttered background. Accompany with extra information, we need a more delicate design to properly integrate multi-modal and multi-scale clues. In this paper, we propose an intensively information interaction network (I<sup>3</sup>Net) to perform Red-Green-Blue and Thermal (RGB-T) SOD, which optimizes the performance through modality interaction, level interaction, and scale interaction. Firstly, feature channels from different sources are dynamically selected according to the modality interaction with dynamic merging module. Then, adjacent level interaction is conducted under the guidance of coordinate channel and spatial attention with spatial feature aggregation module. Finally, we deploy pyramid attention module to obtain a more comprehensive scale interaction. Extensive experiments on four RGB-T datasets, VT821, VT1000, VT5000 and VI-RGBT3500, show that the proposed I<sup>3</sup>Net achieves a competitive and excellent performance against 13 state-of-the-art methods in multiple evaluation metrics, with a 1.70%, 1.41%, and 1.54% improvement in terms of weighted F-measure, mean E-measure, and S-measure.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105525"},"PeriodicalIF":4.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Counterfactual learning and saliency augmentation for weakly supervised semantic segmentation 弱监督语义分割的反事实学习和显著性增强

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-03-31 DOI: 10.1016/j.imavis.2025.105523

Xiangfu Ding , Youjia Shao , Na Tian , Li Wang , Wencang Zhao

{"title":"Counterfactual learning and saliency augmentation for weakly supervised semantic segmentation","authors":"Xiangfu Ding , Youjia Shao , Na Tian , Li Wang , Wencang Zhao","doi":"10.1016/j.imavis.2025.105523","DOIUrl":"10.1016/j.imavis.2025.105523","url":null,"abstract":"<div><div>The weakly supervised semantic segmentation based on image-level annotation has garnered widespread attention due to its excellent annotation efficiency and remarkable scalability. Numerous studies have utilized class activation maps generated by classification networks to produce pseudo-labels and train segmentation models accordingly. However, these methods exhibit certain limitations: biased localization activations, co-occurrence from the background, and semantic absence of target objects. We re-examine the aforementioned issues from a causal perspective and propose a framework for CounterFactual Learning and Saliency Augmentation (CFLSA) based on causal inference. CFLSA consists of a debiased causal chain and a positional causal chain. The debiased causal chain, through counterfactual decoupling generation module, compels the model to focus on constant target features while disregarding background features. It effectively eliminates spurious correlations between foreground objects and the background. Additionally, issues of biased activation and co-occurring pixel are alleviated. Secondly, in order to enable the model to recognize more comprehensive semantic information, we introduce a saliency augmentation mechanism in the positional causal chain to dynamically perceive foreground objects and background information. It can facilitate pixel-level feedback, leading to improved segmentation performance. With the collaboration of both chains, CFLSA achieves advanced results on the PASCAL VOC 2012 and MS COCO 2014 datasets.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105523"},"PeriodicalIF":4.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143747751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0