Image and Vision Computing最新文献

筛选
英文 中文
Invariant prompting with classifier rectification for continual learning 带分类器校正的不变量提示用于持续学习
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-11 DOI: 10.1016/j.imavis.2025.105641
Chunsing Lo , Hao Zhang , Andy J. Ma
{"title":"Invariant prompting with classifier rectification for continual learning","authors":"Chunsing Lo ,&nbsp;Hao Zhang ,&nbsp;Andy J. Ma","doi":"10.1016/j.imavis.2025.105641","DOIUrl":"10.1016/j.imavis.2025.105641","url":null,"abstract":"<div><div>Continual learning aims to train a model capable of continuously learning and retaining knowledge from a sequence of tasks. Recently, prompt-based continual learning has been proposed to leverage the generalization ability of a pre-trained model with task-specific prompts for instruction. Prompt component training is a promising approach to enhancing the plasticity for prompt-based continual learning. Nevertheless, this approach changes the instructed features to be noisy for query samples from the old tasks. Additionally, the problem of scale misalignment in classifier logits between different tasks leads to misclassification. To address these issues, we propose an invariant Prompting with Classifier Rectification (iPrompt-CR) method for prompt-based continual learning. In our method, the learnable keys corresponding to each new-task component are constrained to be orthogonal to the query prototype in the old tasks for invariant prompting, which reduces feature representation noise. After prompt learning, instructed features are sampled from Gaussian-distributed prototypes for classifier rectification with unified logit scale for more accurate predictions. Extensive experimental results on four benchmark datasets demonstrate that our method outperforms the state of the arts in both class-incremental learning and more realistic general incremental learning scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105641"},"PeriodicalIF":4.2,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DFF-Net: Deep Feature Fusion Network for low-light image enhancement DFF-Net:用于弱光图像增强的深度特征融合网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-09 DOI: 10.1016/j.imavis.2025.105645
Hongchang Zhang, Longtao Wang, Qizhan Zou, Juan Zeng
{"title":"DFF-Net: Deep Feature Fusion Network for low-light image enhancement","authors":"Hongchang Zhang,&nbsp;Longtao Wang,&nbsp;Qizhan Zou,&nbsp;Juan Zeng","doi":"10.1016/j.imavis.2025.105645","DOIUrl":"10.1016/j.imavis.2025.105645","url":null,"abstract":"<div><div>Low-light image enhancement methods are designed to improve brightness, recover texture details, restore color fidelity and suppress noise in images captured in low-light environments. Although many low-light image enhancement methods have been proposed, existing methods still face two limitations: (1) the inability to achieve all these objectives at the same time; and (2) heavy reliance on supervised methods that limits practical applicability in real-world scenarios. To overcome these challenges, we propose a Deep Feature Fusion Network (DFF-Net) for low-light image enhancement which builds upon Zero-DCE’s light-enhancement curve. The network is trained without requiring any paired datasets through a set of carefully designed non-reference loss functions. Furthermore, we develop a Fast Deep-level Residual Block (FDRB) to strengthen DFF-Net’s performance, which demonstrates superior performance in both feature extraction and computational efficiency. Comprehensive quantitative and qualitative experiments demonstrate that DFF-Net achieves superior performance in both subjective visual quality and downstream computer vision tasks. In low-light image enhancement experiments, DFF-Net achieves either optimal or sub-optimal metrics across all six public datasets compared to other unsupervised methods. And in low-light object detection experiments, DFF-Net achieves maximum scores in four key metrics on the ExDark dataset: P at 83.3%, F1 at 72.8%, mAP50 at 74.9%, and mAP50-95 at 48.9%. Code is available at <span><span>https://github.com/WangL0ngTa0/DFF-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105645"},"PeriodicalIF":4.2,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ACMC: Adaptive cross-modal multi-grained contrastive learning for continuous sign language recognition 连续手语识别的自适应跨模态多粒度对比学习
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-09 DOI: 10.1016/j.imavis.2025.105622
Xu-Hua Yang, Hong-Xiang Hu, XuanYu Lin
{"title":"ACMC: Adaptive cross-modal multi-grained contrastive learning for continuous sign language recognition","authors":"Xu-Hua Yang,&nbsp;Hong-Xiang Hu,&nbsp;XuanYu Lin","doi":"10.1016/j.imavis.2025.105622","DOIUrl":"10.1016/j.imavis.2025.105622","url":null,"abstract":"<div><div>Continuous sign language recognition helps the hearing-impaired community participate in social communication by recognizing the semantics of sign language video. However, the existing CSLR methods usually only implement cross-modal alignment at the sentence level or frame level, and do not fully consider the potential impact of redundant frames and semantically independent gloss identifiers on the recognition results. In order to improve the limitations of the above methods, we propose an adaptive cross-modal multi-grained contrastive learning (ACMC) for continuous sign language recognition, which achieve more accurate cross-modal semantic alignment through a multi-grained contrast mechanism. First, the ACMC uses the frame extractor and the temporal modeling module to obtain the fine-grained and coarse-grained features of the visual modality in turn, and extracts the fine-grained and coarse-grained features of the text modality through the CLIP text encoder. Then, the ACMC adopts coarse-grained contrast and fine-grained contrast methods to effectively align the features of visual and text modalities from global and local perspectives, and alleviate the semantic interference caused by redundant frames and semantically independent gloss identifiers through cross-grained contrast. In addition, in the video frame extraction stage, we design an adaptive learning module to strengthen the features of key regions of video frames through the calculated discrete spatial feature decision matrix, and adaptively fuse the convolution features of key frames with the trajectory information between adjacent frames, thereby reducing the computational cost. Experimental results show that the proposed ACMC model achieves very competitive recognition results on sign language datasets such as PHOENIX14, PHOENIX14-T and CSL-Daily.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105622"},"PeriodicalIF":4.2,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BSMEF: Optimized multi-exposure image fusion using B-splines and Mamba BSMEF:使用b样条和曼巴优化的多曝光图像融合
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-08 DOI: 10.1016/j.imavis.2025.105660
Jinyong Cheng , Qinghao Cui , Guohua Lv
{"title":"BSMEF: Optimized multi-exposure image fusion using B-splines and Mamba","authors":"Jinyong Cheng ,&nbsp;Qinghao Cui ,&nbsp;Guohua Lv","doi":"10.1016/j.imavis.2025.105660","DOIUrl":"10.1016/j.imavis.2025.105660","url":null,"abstract":"<div><div>In recent years, multi-exposure image fusion has been widely applied to process overexposed or underexposed images due to its simplicity, effectiveness, and low cost. With the development of deep learning techniques, related fusion methods have been continuously optimized. However, retaining global information from source images while preserving fine local details remains challenging, especially when fusing images with extreme exposure differences, where boundary transitions often exhibit shadows and noise. To address this, we propose a multi-exposure image fusion network model, BSMEF, based on B-Spline basis functions and Mamba. The B-Spline basis function, known for its smoothness, reduces edge artifacts and enables smooth transitions between images with varying exposure levels. In BSMEF, the feature extraction module, combining B-Spline and deformable convolutions, preserves global features while effectively extracting fine-grained local details. Additionally, we design a feature enhancement module based on Mamba blocks, leveraging its powerful global perception ability to capture contextual information. Furthermore, the fusion module integrates three feature enhancement methods: B-Spline basis functions, attention mechanisms, and Fourier transforms, addressing shadow and noise issues at fusion boundaries and enhancing the focus on important features. Experimental results demonstrate that BSMEF outperforms existing methods across multiple public datasets.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105660"},"PeriodicalIF":4.2,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144605700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BCDPose: Diffusion-based 3D Human Pose Estimation with bone-chain prior knowledge BCDPose:基于骨链先验知识的扩散三维人体姿态估计
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-08 DOI: 10.1016/j.imavis.2025.105636
Xing Liu , Hao Tang
{"title":"BCDPose: Diffusion-based 3D Human Pose Estimation with bone-chain prior knowledge","authors":"Xing Liu ,&nbsp;Hao Tang","doi":"10.1016/j.imavis.2025.105636","DOIUrl":"10.1016/j.imavis.2025.105636","url":null,"abstract":"<div><div>Recently, diffusion-based methods have emerged as the golden standard in 3D Human Pose Estimation task, largely thanks to their exceptional generative capabilities. In the past, researchers have made concerted efforts to develop spatial and temporal denoisers utilizing transformer blocks in diffusion-based methods. However, existing Transformer-based denoisers in diffusion models often overlook implicit structural and kinematic supervision derived from prior knowledge of human biomechanics, including prior knowledge of human bone-chain structure and joint kinematics, which could otherwise enhance performance. We hold the view that joint movements are intrinsically constrained by neighboring joints within the bone-chain and by kinematic hierarchies. Then, we propose a <strong>B</strong>one-<strong>C</strong>hain enhanced <strong>D</strong>iffusion 3D pose estimation method, or <strong>BCDPose</strong>. In this method, we introduce a novel Bone-Chain prior knowledge enhanced transformer blocks within the denoiser to reconstruct contaminated 3D pose data. Additionally, we propose the Joint-DoF Hierarchical Temporal Embedding framework, which incorporates prior knowledge of joint kinematics. By integrating body hierarchy and temporal dependencies, this framework effectively captures the complexity of human motion, thereby enabling accurate and robust pose estimation. This innovation proposes a comprehensive framework for 3D human pose estimation by explicitly modeling joint kinematics, thereby overcoming the limitations of prior methods that fail to capture the intrinsic dynamics of human motion. We conduct extensive experiments on various open benchmarks to evaluate the effectiveness of BCDPose. The results convincingly demonstrate that BCDPose achieves highly competitive results compared with other state-of-the-art models. This underscores its promising potential and practical applicability in 2D–3D human pose estimation tasks. We plan to release our code publicly for further research.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105636"},"PeriodicalIF":4.2,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MaxSwap-Enhanced Knowledge Consistency Learning for long-tailed recognition 基于maxswap的长尾识别增强知识一致性学习
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-07 DOI: 10.1016/j.imavis.2025.105643
Shengnan Fan, Zhilei Chai, Zhijun Fang, Yuying Pan, Hui Shen, Xiangyu Cheng, Qin Wu
{"title":"MaxSwap-Enhanced Knowledge Consistency Learning for long-tailed recognition","authors":"Shengnan Fan,&nbsp;Zhilei Chai,&nbsp;Zhijun Fang,&nbsp;Yuying Pan,&nbsp;Hui Shen,&nbsp;Xiangyu Cheng,&nbsp;Qin Wu","doi":"10.1016/j.imavis.2025.105643","DOIUrl":"10.1016/j.imavis.2025.105643","url":null,"abstract":"<div><div>Deep learning has made significant progress in image classification. However, real-world datasets often exhibit a long-tailed distribution, where a few head classes dominate while many tail classes have very few samples. This imbalance leads to poor performance on tail classes. To address this issue, we propose MaxSwap-Enhanced Knowledge Consistency Learning which includes two core components: Knowledge Consistency Learning and MaxSwap for Confusion Suppression. Knowledge Consistency Learning leverages the outputs from different augmented views as soft labels to capture inter-class similarities and introduces a consistency constraint to enforce output consistency across different perturbations, which enables tail classes to effectively learn from head classes with similar features. To alleviate the bias towards head classes, we further propose a MaxSwap for Confusion Suppression to adaptively adjust the soft labels when the model makes incorrect predictions which mitigates overconfidence in incorrect predictions. Experimental results demonstrate that our method achieves significant improvements on long-tailed datasets such as CIFAR10-LT, CIFAR100-LT, ImageNet-LT, and Places-LT, which validates the effectiveness of our approach.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105643"},"PeriodicalIF":4.2,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Composed image retrieval by Multimodal Mixture-of-Expert Synergy 基于多模态混合专家协同的组合图像检索
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-07 DOI: 10.1016/j.imavis.2025.105634
Wenzhe Zhai , Mingliang Gao , Gwanggil Jeon , Qiang Zhou , David Camacho
{"title":"Composed image retrieval by Multimodal Mixture-of-Expert Synergy","authors":"Wenzhe Zhai ,&nbsp;Mingliang Gao ,&nbsp;Gwanggil Jeon ,&nbsp;Qiang Zhou ,&nbsp;David Camacho","doi":"10.1016/j.imavis.2025.105634","DOIUrl":"10.1016/j.imavis.2025.105634","url":null,"abstract":"<div><div>Composed image retrieval (CIR) is essential in security surveillance, e-commerce, and social media analysis. It provides precise information retrieval and intelligent analysis solutions for various industries. The majority of existing CIR models create a pseudo-word token from the reference image, which is subsequently incorporated into the corresponding caption for the image retrieval task. However, these pseudo-word-based prompting approaches are limited when the target image entails complex modifications to the reference image, such as object removal and attribute changes. To address the issue, we propose a Multimodal Mixture-of-Expert Synergy (MMES) model to achieve effective composed image retrieval. The MMES model initially utilizes multiple Mixture of Expert (MoE) modules through the mixture expert unit to process various types of multimodal input data. Subsequently, the outputs from these expert models are fused through the cross-modal integration module. Furthermore, the fused features generate implicit text embedding prompts, which are concatenated with the relative descriptions. Finally, retrieval is conducted using a text encoder and an image encoder. The Experiments demonstrate that the proposed method outperforms state-of-the-art CIR methods on the CIRR and Fashion-IQ datasets.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105634"},"PeriodicalIF":4.2,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Label refinement for change detection in remote sensing 遥感中用于变化检测的标签细化
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-07 DOI: 10.1016/j.imavis.2025.105639
Zhilong Ou , Hongxing Wang , Jiawei Tan , Jiaxin Li , Ziyi Zhao , Zhangbin Qian
{"title":"Label refinement for change detection in remote sensing","authors":"Zhilong Ou ,&nbsp;Hongxing Wang ,&nbsp;Jiawei Tan ,&nbsp;Jiaxin Li ,&nbsp;Ziyi Zhao ,&nbsp;Zhangbin Qian","doi":"10.1016/j.imavis.2025.105639","DOIUrl":"10.1016/j.imavis.2025.105639","url":null,"abstract":"<div><div>Change detection in remote sensing aims to detect changes occurring in the same geographical area over time. Existing methods present two main challenges: (1) relying on single-scale features to capture multi-scale object changes, which limits their ability to effectively handle multi-scale change; and (2) misclassification issues caused by prediction uncertainty, particularly in regions near decision boundaries, leading to reduced overall detection performance. In this study, to address these limitations, we propose LRNet, a multi-scale change detection framework designed to enhance the perception of objects at varying scales and refine change region details during decoding. Abandoning the use of fixed thresholds for classification, LRNet incorporates a Label Refinement (LR) strategy that propagates information from high-confidence regions to low-confidence regions by evaluating feature-space similarity, enabling precise grouping of pixels within change regions. Extensive experiments on benchmark datasets — SYSU-CD, LEVIR-CD+, and SECOND-CD — demonstrate that LRNet outperforms state-of-the-art methods, with significant improvements of 7.9% in F1 and 12.38% in IoU on the challenging SECOND-CD dataset.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105639"},"PeriodicalIF":4.2,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale pyramid convolution transformer for remote-sensing object detection 用于遥感目标检测的多尺度金字塔卷积变压器
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-06 DOI: 10.1016/j.imavis.2025.105651
Jin Huagang, Zhou Yu
{"title":"Multi-scale pyramid convolution transformer for remote-sensing object detection","authors":"Jin Huagang,&nbsp;Zhou Yu","doi":"10.1016/j.imavis.2025.105651","DOIUrl":"10.1016/j.imavis.2025.105651","url":null,"abstract":"<div><div>Reliable object detection in remote sensing imageries (RSIs) is an essential and challenging task in surface monitoring. However, RSIs are normally obtained from a high-altitude top-down perspective, causing intrinsic properties such as complex background, aspect ratio, color and scale variations. These properties restrict the domain transfer of sophisticated detector on nature images to RSIs, thereby deteriorating the desired detection performance. To address this issue, we propose a multi-scale pyramid convolution Transformer (MPCViT) that alleviates the limitations of ordinary visual Transformer. Specifically, we firstly employ an improved CNN to extract image features, generating initial feature pyramid. Then, bidirectional feature aggregation strategy is further used to improve feature representation capacity through feature enhancement and aggregation steps. To facilitate deep interaction of global dependencies and local details, dual-route encoding mechanism is constructed in each Transformer encoder. During inference stage, an iterative sparse keypoint sampling head is devised to enhance the detection accuracy. The competitive experimental results on NWPU VHR-10 and DIOR verify the efficacy of the proposed MPCViT.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105651"},"PeriodicalIF":4.2,"publicationDate":"2025-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144588160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UniFormer: Consistency regularization-based semi-supervised semantic segmentation via differential dual-branch strongly augmented perturbations UniFormer:基于一致性正则化的半监督语义分割,基于差分双分支强增广扰动
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-07-05 DOI: 10.1016/j.imavis.2025.105640
Shengkun Qi , Bing Liu , Yong Zhou , Peng Liu , Chen Zhang , Siyu Chen
{"title":"UniFormer: Consistency regularization-based semi-supervised semantic segmentation via differential dual-branch strongly augmented perturbations","authors":"Shengkun Qi ,&nbsp;Bing Liu ,&nbsp;Yong Zhou ,&nbsp;Peng Liu ,&nbsp;Chen Zhang ,&nbsp;Siyu Chen","doi":"10.1016/j.imavis.2025.105640","DOIUrl":"10.1016/j.imavis.2025.105640","url":null,"abstract":"<div><div>Consistency regularization is a common approach in the field of semi-supervised semantic segmentation. Many recent methods typically adopt a dual-branch structure with strongly augmented perturbations based on the DeepLabV3+ model. However, these methods suffer from the limited receptive field of DeepLabV3+ and the lack of diversity in the predictions generated by the dual branches, leading to insufficient generalization performance. To address these issues, we propose a novel consistency regularization-based semi-supervised semantic segmentation framework, which adopts dual-branch SegFormer models as the backbone to overcome the limitations of the DeepLabV3+ model, termed UniFormer. We present a Random Strong Augmentation Perturbation (RSAP) module to enhance prediction diversity between the dual branches, thereby improving the robustness and generalization performance of UniFormer. In addition, we introduce a plug-and-play self-attention module that can effectively model the global dependencies of visual features to improve segmentation accuracy. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance on most evaluation protocols across the Pascal, Cityscapes, and COCO datasets. The code and pre-trained weights are available at: <span><span>https://github.com/qskun/UniFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105640"},"PeriodicalIF":4.2,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信