Neelma Naz , Maheen Salman , Fiza Ayub , Zawata Afnan Asif , Sara Ali
{"title":"PakSign: Advancing dynamic Pakistani Sign Language Recognition with a novel skeleton-based dataset and graph-enhanced architectures","authors":"Neelma Naz , Maheen Salman , Fiza Ayub , Zawata Afnan Asif , Sara Ali","doi":"10.1016/j.cviu.2025.104458","DOIUrl":"10.1016/j.cviu.2025.104458","url":null,"abstract":"<div><div>Sign Language Recognition (SLR) is a critical yet complex task in pattern recognition and computer vision due to the visual-gestural nature of sign languages. While regional variants like American, British, and Chinese Sign Languages have seen significant research advancements, Pakistani Sign Language (PSL) remains underexplored, mostly limited to static Urdu alphabet recognition rather than dynamic gestures used in daily communication. The scarcity of large-scale PSL datasets further hinders the training of deep learning models, which require extensive data. This work addresses these gaps by introducing a novel skeleton-based PSL dataset comprising over 1280 pose sequences of 52 Urdu signs, each performed five times by five different signers. We detail the data collection protocol and evaluate lightweight, pose-based baseline models using a K-fold cross-validation protocol. Furthermore, we propose Efficient-Sign, a novel recognition pipeline with two variants: B0, achieving a 2.28% accuracy gain with 35.37% fewer FLOPs and 63.55% fewer parameters, and B4, yielding a 3.48% accuracy improvement and 14.95% fewer parameters when compared to state-of-the-art model. We also conduct cross-dataset evaluations on widely-used benchmarks such as WLASL-100 and MINDS-Libras, where Efficient-Sign maintains competitive accuracy with substantially fewer parameters and computational overhead. These results confirm the model’s generalizability and robustness across diverse sign languages and signer populations. This work contributes significantly by providing a publicly available pose-based PSL dataset, strong baseline evaluations, and an efficient architecture for benchmarking future research, marking a critical advancement in dynamic PSL recognition and establishing a foundation for scalable, real-world SLR systems.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104458"},"PeriodicalIF":3.5,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144842171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziyun Cai , Yawen Huang , Tengfei Zhang , Changhui Hu , Xiao-Yuan Jing
{"title":"Adaptive margin for unsupervised domain adaptation without source data","authors":"Ziyun Cai , Yawen Huang , Tengfei Zhang , Changhui Hu , Xiao-Yuan Jing","doi":"10.1016/j.cviu.2025.104455","DOIUrl":"10.1016/j.cviu.2025.104455","url":null,"abstract":"<div><div>Unsupervised domain adaptation (UDA) methods aim to transfer the knowledge acquired from labeled source data to unlabeled target data. However, these methods are often inefficient and impractical due to concerns related to data privacy and memory storage. As a result, source-free domain adaptation (SFDA) was introduced as a solution, which involves deploying a well-trained source model to the target domain, while the source data are unavailable for optimization. Existing pseudo-label based SFDA methods suffer from two issues: (1) they do not well leverage the discriminating power of the model at the early step of the training; (2) they do not well prevent memorization of the noisy labels at the late step of the training. In this paper, we propose a novel method called AM-SFDA to address SFDA issue via <strong>A</strong>daptive <strong>M</strong>argin. AM-SFDA combines the information maximization and the commonly used standard cross-entropy loss, which can make the source and target outputs closer. Furthermore, inspired by the early-learning phenomenon, we propose to prevent the memorization of the noisy samples, where large values are assigned to the samples with moderate margins, and small values are assigned to the samples with small margins. Extensive experiments on several source-free benchmarks under different settings illustrate that AM-SFDA exceeds the existing state-of-the-art SFDA methods successfully.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104455"},"PeriodicalIF":3.5,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144722963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EPDiff: Enhancing Prior-guided Diffusion model for Real-world Image Super-Resolution","authors":"Detian Huang , Miaohua Ruan , Yaohui Guo , Zhenzhen Hu , Huanqiang Zeng","doi":"10.1016/j.cviu.2025.104453","DOIUrl":"10.1016/j.cviu.2025.104453","url":null,"abstract":"<div><div>Diffusion Models (DMs) have achieved promising success in Real-world Image Super-Resolution (Real-ISR), where they reconstruct High-Resolution (HR) images from available Low-Resolution (LR) counterparts with unknown degradation by leveraging pre-trained Text-to-Image (T2I) diffusion models. However, due to the randomness nature of DMs and the severe degradation commonly presented in LR images, most DMs-based Real-ISR methods neglect the structure-level and semantic information, which results in reconstructed HR images suffering not only from important edge missing, but also from undesired regional information confusion. To tackle these challenges, we propose an Enhancing Prior-guided Diffusion model (EPDiff) for Real-ISR, which leverages high-frequency priors and semantic guidance to generate reconstructed images with realistic details. Firstly, we design a Guide Adapter (GA) module that extracts latent texture and edge features from LR images to provide high-frequency priors. Subsequently, we introduce a Semantic Prompt Extractor (SPE) that generates high-quality semantic prompts to enhance image understanding. Additionally, we build a Feature Rectify ControlNet (FRControlNet) to refine feature modulation, enabling realistic detail generation. Extensive experiments demonstrate that the proposed EPDiff outperforms state-of-the-art methods on both synthetic and real-world datasets.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104453"},"PeriodicalIF":3.5,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144722582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crack segmentation in roads using synthetic data and RGB-D data fusion","authors":"Benedict Marsh, Ruiheng Wu","doi":"10.1016/j.cviu.2025.104452","DOIUrl":"10.1016/j.cviu.2025.104452","url":null,"abstract":"<div><div>In this paper, we use deep learning on the task of crack segmentation using a novel data fusion approach with RGB-D data. We use an existing architecture with DeepLabV3 and synthetic data to address the issue of limited availability for real-world data. The synthetic data is generated with Blender and BlenSor to accurately model the real-world crack scenarios. We train the model with a mixture of real-world data and synthetic data and evaluate it on a real-world dataset. The results show significant improvements over baseline models that only use the RGB data when evaluated with the IoU and F1-score. This demonstrates the success of using synthetic data for crack segmentation with data fusion and suggests a promising direction for future crack detection research to provide increased accuracy in automated maintenance and monitoring applications.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104452"},"PeriodicalIF":3.5,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144779905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Zhang , Zhi-Qi Cheng , Jian Zhao , Xiaojiang Peng , Xuelong Li
{"title":"LEAF: Unveiling two sides of the same coin in semi-supervised facial expression recognition","authors":"Fan Zhang , Zhi-Qi Cheng , Jian Zhao , Xiaojiang Peng , Xuelong Li","doi":"10.1016/j.cviu.2025.104451","DOIUrl":"10.1016/j.cviu.2025.104451","url":null,"abstract":"<div><div>Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition (FER) task. However, current state-of-the-art methods primarily <em>focus on one side of the coin, i.e., generating high-quality pseudo-labels</em>, while <em>overlooking the other side: enhancing expression-relevant representations</em>. In this paper, we <em>unveil both sides of the coin</em> by proposing a <em>unified</em> framework termed hierarchica<u>L</u> d<u>E</u>coupling <u>A</u>nd <u>F</u>using (LEAF) to <em>coordinate</em> expression-relevant representations and pseudo-labels for semi-supervised FER. LEAF introduces a hierarchical expression-aware aggregation strategy that operates at three levels: semantic, instance, and category. (1) At the semantic and instance levels, LEAF <em>decouples</em> representations into expression-agnostic and expression-relevant components, and <em>adaptively fuses</em> them using learnable gating weights. (2) At the category level, LEAF <em>assigns</em> ambiguous pseudo-labels by <em>decoupling</em> predictions into positive and negative parts, and employs a consistency loss to ensure agreement between two augmented views of the same image. Extensive experiments on benchmark datasets demonstrate that by unveiling and harmonizing both sides of the coin, LEAF outperforms state-of-the-art semi-supervised FER methods, effectively leveraging both labeled and unlabeled data. Moreover, the proposed expression-aware aggregation strategy can be seamlessly integrated into existing semi-supervised frameworks, leading to significant performance gains. Our code is available at <span><span>https://github.com/zfkarl/LEAF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104451"},"PeriodicalIF":3.5,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144779904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A non-overlapping image stitching method for reconstruction of page in ancient Chinese books","authors":"Yizhou Lan, Daoyuan Zheng, Qingwu Hu, Shaohua Wang, Shunli Wang, Tong Yue, Jiayuan Li","doi":"10.1016/j.cviu.2025.104449","DOIUrl":"10.1016/j.cviu.2025.104449","url":null,"abstract":"<div><div>Automatic stitching of page images in ancient Chinese books plays an important role in preservation and transmission of cultural heritage, significantly diminishing the need for manual intervention. Current methods accomplish image stitching based on their overlapping area and struggle with the ancient book pages without overlapping areas. To overcome this hurdle, this study proposes a novel deep learning based method to accurately stitch ancient book pages, which contains three key steps. Firstly, aiming to locate stitching seams precisely, a semantic segmentation model is exploited to predict the thickness masks of page images, and the non-overlapping pages can be stitched by cropping the thickness areas. Secondly, a novel multi-rule page stitching module with two creative page alignment methods is designed to align elements along the stitching seams. Lastly, the proposed method encompasses a self-assessment module, which judiciously selects the optimal stitched outcome from the multiple probable outputs generated by the multi-rule page stitching module. Experimental results demonstrate that the proposed method achieves superior performance in automatic stitching of ancient book pages. The stitching results on over 140 pages from three different ancient books show an accuracy of 82.18%, with 37.75% improvements over existing methods. This method provides a foundation for the automatic digitization of ancient Chinese books, showing significant potential applications in the field of automatic character recognition for historical documents.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104449"},"PeriodicalIF":4.3,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144703892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stochastic BIQA: Median randomized smoothing for certified blind image quality assessment","authors":"Ekaterina Shumitskaya , Mikhail Pautov , Dmitriy Vatolin , Anastasia Antsiferova","doi":"10.1016/j.cviu.2025.104447","DOIUrl":"10.1016/j.cviu.2025.104447","url":null,"abstract":"<div><div>Most modern No-Reference Image-Quality Assessment (NR-IQA) metrics are based on neural networks vulnerable to adversarial attacks. Although some empirical defenses for IQA metrics were proposed, they do not provide theoretical guarantees and may be vulnerable to adaptive attacks. This work focuses on developing a provably robust no-reference IQA metric. The proposed DMS-IQA method is based on randomized Median Smoothing combined with an additional convolution denoiser with ranking loss to improve the SROCC and PLCC scores of the defended IQA metric. We theoretically show that the output of the defended IQA metric changes by no more than a predefined delta for all input perturbations bounded by a given <span><math><msub><mrow><mi>l</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> norm. Compared with two prior methods on three datasets, our method exhibited superior SROCC and PLCC scores while maintaining comparable certified guarantees. We also experimentally demonstrate that embedding the DMS-IQA defended quality metric into the training of image processing algorithms can yield benefits, but it requires extra computational resources. We made the code available on GitHub.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104447"},"PeriodicalIF":4.3,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiming Lu , Bin Ge , Chenxing Xia , Xu Zhu , Mengge Zhang , Mengya Gao , Ningjie Chen , Jianjun Hu , Junjie Zhi
{"title":"FCEGNet: Feature calibration and edge-guided MLP decoder Network for RGB-D semantic segmentation","authors":"Yiming Lu , Bin Ge , Chenxing Xia , Xu Zhu , Mengge Zhang , Mengya Gao , Ningjie Chen , Jianjun Hu , Junjie Zhi","doi":"10.1016/j.cviu.2025.104448","DOIUrl":"10.1016/j.cviu.2025.104448","url":null,"abstract":"<div><div>The references from depth image data provide rich geometric information for traditional RGB semantic segmentation, which effectively improves the performance of semantic segmentation. However, during the process of feature fusion, there are feature biases between RGB features and depth features, which negatively affect cross-modal feature fusion. In this paper, we propose a novel RGB-D network, FCEGNet, consisting of a Feature Calibration Interaction Module (FCIM), a Three-Stream Fusion Extraction Module(TFEM), and an edge-guided MLP decoder. FCIM processes features in different orientations and scales by balancing features across modalities, and exchanges spatial information to allow RGB and depth features to be calibrated and interact with cross-modal features. TFEM performs feature extraction on cross-modal features and combines them with unimodal features to improve the accuracy of enhanced semantic understanding and fine-grained recognition. Dual-stream edge guidance module (DEGM) is designed in the edge-guided MLP decoder to protect the consistency and disparity of cross-modal features while enhancing the edge information and preserving the spatial information, which helps to obtain more accurate segmentation results. Experimental results on the RGB-D dataset show that the proposed FCFGNet is superior and more efficient than several state-of-the-art methods. The generalised validation of FCEGNet on the RGB-T semantic segmentation dataset also achieves better results.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104448"},"PeriodicalIF":3.5,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144771810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estelle Chigot , Dennis G. Wilson , Meriem Ghrib , Thomas Oberlin
{"title":"Style transfer with diffusion models for synthetic-to-real domain adaptation","authors":"Estelle Chigot , Dennis G. Wilson , Meriem Ghrib , Thomas Oberlin","doi":"10.1016/j.cviu.2025.104445","DOIUrl":"10.1016/j.cviu.2025.104445","url":null,"abstract":"<div><div>Semantic segmentation models trained on synthetic data often perform poorly on real-world images due to domain gaps, particularly in adverse conditions where labeled data is scarce. Yet, recent foundation models enable to generate realistic images without any training. This paper proposes to leverage such diffusion models to improve the performance of vision models when learned on synthetic data. We introduce two novel techniques for semantically consistent style transfer using diffusion models: Class-wise Adaptive Instance Normalization and Cross-Attention (<span><math><mtext>CACTI</mtext></math></span>) and its extension with selective attention Filtering (<span><math><msub><mrow><mtext>CACTI</mtext></mrow><mrow><mtext>F</mtext></mrow></msub></math></span>). <span><math><mtext>CACTI</mtext></math></span> applies statistical normalization selectively based on semantic classes, while <span><math><msub><mrow><mtext>CACTI</mtext></mrow><mrow><mtext>F</mtext></mrow></msub></math></span> further filters cross-attention maps based on feature similarity, preventing artifacts in regions with weak cross-attention correspondences. Our methods transfer style characteristics while preserving semantic boundaries and structural coherence, unlike approaches that apply global transformations or generate content without constraints. Experiments using GTA5 as source and Cityscapes/ACDC as target domains show that our approach produces higher quality images with lower FID scores and better content preservation. Our work demonstrates that class-aware diffusion-based style transfer effectively bridges the synthetic-to-real domain gap even with minimal target domain data, advancing robust perception systems for challenging real-world applications. The source code is available at: <span><span>https://github.com/echigot/cactif</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104445"},"PeriodicalIF":4.3,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144655382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effects of smart walker and augmented reality on gait parameters of a patient with spinocerebellar ataxia: Case report","authors":"Matheus Loureiro , Janine Valentino , Weslley Oliveira , Fabiana Machado , Arlindo Elias , Ricardo Mello , Arnaldo Leal , Anselmo Frizera","doi":"10.1016/j.cviu.2025.104446","DOIUrl":"10.1016/j.cviu.2025.104446","url":null,"abstract":"<div><div>Ataxia is a neurological condition that impairs mobility and independence in daily activities. To mitigate the symptoms, patients often seek physical therapy interventions. However, these therapies can be challenging for some individuals, depending on their level of independence, and patients may experience pain and frustration due to repetitive tasks. To address these limitations, rehabilitation robots, such as the Smart Walker (SW), can be tailored to an individual’ s degree of independence, while Augmented Reality (AR) systems can enhance patient engagement and motivation. However, the use of AR may also lead to adverse effects, such as restrictions in gait patterns and the potential of cybersickness symptoms. In this context, this paper presents a case report of a patient with ataxia to evaluate the effects of the SW and AR in three tasks: Physiotherapist-Assisted Gait (PAG), Walker-Assisted Gait (WAG), and Augmented Reality Walker-Assisted Gait (ARWAG). The results show that the use of the SW in WAG led to improvements in gait parameters, including a 27% increase in step length and a 19% increase in hip excursion in the sagittal plane. In ARWAG, these improvements were even greater, with a 58% increase in step length and a 43% increase in hip excursion in the sagittal plane. No cybersickness symptoms were observed during the ARWAG. Additionally, among all tasks, the patient expressed a preference for the ARWAG, indicating that the combination of SW and AR holds potential benefits for assisting ataxia patients in physical therapy interventions.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104446"},"PeriodicalIF":4.3,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144631710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}