Zeynep Hilal Kilimci , Mustafa Yalcin , Ayhan Kucukmanisa , Amit Kumar Mishra
{"title":"Advancing heart disease diagnosis with vision-based transformer architectures applied to ECG imagery","authors":"Zeynep Hilal Kilimci , Mustafa Yalcin , Ayhan Kucukmanisa , Amit Kumar Mishra","doi":"10.1016/j.imavis.2025.105666","DOIUrl":"10.1016/j.imavis.2025.105666","url":null,"abstract":"<div><div>Cardiovascular disease, a critical medical condition that affects the heart and blood vessels, requires timely detection for effective clinical intervention. This includes coronary artery disease, heart failure, and myocardial infarction. Our goal is to improve the detection of heart disease through proactive interventions and personalized treatments. Early identification of at-risk individuals using advanced technologies can mitigate disease progression and reduce adverse outcomes. Using recent technological advancements, we propose a novel approach for heart disease detection using vision transformer models, namely Google-Vit, Microsoft-Beit, Deit, and Swin-Tiny. This marks the initial application of transformer models to image-based electrocardiogram (ECG) data for the detection of heart disease. The experimental results demonstrate the efficacy of vision transformers in this domain, with BEiT achieving the highest classification accuracy of 95.9% in a 5-fold cross-validation setting, further improving to 96.6% using an 80-20 holdout method. Swin-Tiny also exhibited strong performance with an accuracy of 95.2%, while Google-ViT and DeiT achieved 94.3% and 94.9%, respectively, outperforming many traditional models in ECG-based diagnostics. These findings highlight the potential of vision transformer models in enhancing diagnostic accuracy and risk stratification. The results further underscore the importance of model selection in optimizing performance, with BEiT emerging as the most promising candidate. This study contributes to the growing body of research on transformer-based medical diagnostics and paves the way for future investigations into their clinical applicability and generalizability.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105666"},"PeriodicalIF":4.2,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144679975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessandro Sebastian Podda , Riccardo Balia , Marco Manolo Manca , Jacopo Martellucci , Livio Pompianu
{"title":"A deep learning strategy for the 3D segmentation of colorectal tumors from ultrasound imaging","authors":"Alessandro Sebastian Podda , Riccardo Balia , Marco Manolo Manca , Jacopo Martellucci , Livio Pompianu","doi":"10.1016/j.imavis.2025.105668","DOIUrl":"10.1016/j.imavis.2025.105668","url":null,"abstract":"<div><div>Colorectal cancer remains a leading cause of cancer-related mortality worldwide, highlighting the need for accurate and efficient diagnostic tools. While Deep Learning has shown promise in medical imaging, its application to transrectal ultrasound for colorectal tumor segmentation remains underexplored. Currently, lesion segmentation is performed manually, relying on clinician expertise and leading to significant variability across treatment centers. To overcome this limitations, we propose a novel strategy that addresses both practical challenges and technical constraints, particularly in scenarios with limited data availability, offering a robust framework for accurate 3D colorectal tumor segmentation from ultrasound imaging. We evaluate eight state-of-the-art models, including convolutional neural networks and transformer-based architectures, and introduce domain-tailored pre- and post-processing techniques such as data augmentation, patching and ensembling to enhance segmentation performance while reducing computational cost. Leading to an average improvement in term of DICE score of 0.423 absolute points (+107%), compared to baseline models, our findings demonstrate the potential of our proposal to improve the accuracy and reliability of ultrasound-based diagnostics for colorectal cancer, paving the way for clinically viable AI-driven solutions.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105668"},"PeriodicalIF":4.2,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distribution-modulated binary neural network for image classification","authors":"Yingcheng Lin, Yuxiao Wang, Rui Ding, Haijun Liu, Xichuan Zhou","doi":"10.1016/j.imavis.2025.105646","DOIUrl":"10.1016/j.imavis.2025.105646","url":null,"abstract":"<div><div>Deep neural networks excel at image processing tasks, but their extensive model storage and computational overhead make deployment on edge devices challenging. Binary neural networks (BNNs) have become one of the most prevailing model compression approaches due to the advantage of memory and computation efficiency. However, there exists a large performance gap between BNNs and their full-precision counterparts due to training difficulties. When training BNNs using pseudo-gradients, both dead weights and susceptible weights hinder the optimization of BNNs. To solve these two abnormal weights, in this paper, we propose a distribution-modulated binary neural network (DM-BNN), which incorporates a new regularization for dead weights (RDW) and a novel approximation with a peak-shaped derivative (APSD) for susceptible weights. In detail, RDW can supply additional gradients to eliminate dead weights and form a compact weight distribution, while APSD reduces the number of susceptible weights by facilitating the magnitude increase of susceptible weights. The achieved state-of-the-art experimental results on CIFAR-10 and ImageNet demonstrate the effectiveness of DM-BNN. Our code will be available at <span><span>https://github.com/NianKong/DM-BNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105646"},"PeriodicalIF":4.2,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiqi Kou , Jiapeng Chen , Hailong Zhang , Tianshu Song , He Jiang , Deqiang Cheng , Liangliang Chen
{"title":"DMNet: Image dehazing via Dual-Domain Modulation","authors":"Qiqi Kou , Jiapeng Chen , Hailong Zhang , Tianshu Song , He Jiang , Deqiang Cheng , Liangliang Chen","doi":"10.1016/j.imavis.2025.105659","DOIUrl":"10.1016/j.imavis.2025.105659","url":null,"abstract":"<div><div>Feature representation of hazy images is directly related to the performance of dehazing models. However, existing approaches often struggle to jointly model spatial-frequency domain characteristics for characterization of heterogeneous haze distribution. Therefore, to address these challenges, this work presents an efficient Dual-Domain Modulation Network (DMNet) for image dehazing, which enhances the representation of uneven haze features and the global feature perception by utilizing the deformable convolution and the amplitude-phase guidance strategy. For one thing, since fixed-size convolutions are inadequate for multi-scale feature extraction and inter-channel interactions, we propose the Deformable Convolutional Operator (DCM) based on the spatial non-uniform strategy of channel interactions. Through orthogonal spatial feature aggregation mechanism, the DCM effectively aggregates spatial context information to handle non-uniform haze distribution and reconstruct fine texture details in heavily hazed regions. For another, the amplitude-centric reconstruction paradigm fails to accurately represent the nonlinear mapping between hazy and clear images in the frequency domain and neglects the importance of phase structural feature in image reconstruction. Therefore, we propose the Amplitude-Phase Guidance Module (APGM) to effectively extract global features through implementing low-pass filtering on the amplitude component and high-pass filtering on the phase component. Ultimately, by combining DCM and APGM, we propose the Dual-Domain Modulation Module (DM), which serves as the core component of DMNet to overcome the hurdles faced in achieving fusion between spatial and frequency domains. Extensive experiments demonstrate that DMNet performs favorably against the state-of-the-art (SOTA) approaches, achieving the PSNR over 41.77 dB with only 3.94M parameters.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105659"},"PeriodicalIF":4.2,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144653241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minchen Yang, Ziyi Yang, Nur Intan Raihana Ruhaiyem
{"title":"MAFUNet: Mamba with adaptive fusion UNet for medical image segmentation","authors":"Minchen Yang, Ziyi Yang, Nur Intan Raihana Ruhaiyem","doi":"10.1016/j.imavis.2025.105655","DOIUrl":"10.1016/j.imavis.2025.105655","url":null,"abstract":"<div><div>In medical image segmentation tasks, accurately capturing lesion contours and understanding complex lesion information is crucial, which relies on efficient collaborative modeling of local details and global contours. However, methods based on convolutional neural networks (CNNs) and transformers are limited by local receptive fields and high computational complexity, respectively, making it difficult for existing approaches to achieve a balance between the two. Recently, state-space models represented by Mamba have gained attention due to their significant advantages in capturing long-range dependencies and computational efficiency. Based on the above advantages of Mamba, we propose <strong>M</strong>amba with <strong>A</strong>daptive <strong>F</strong>usion <strong>U</strong>Net (MAFUNet). First, we design a hierarchy-aware Mamba (HAM) module. HAM progressively transmits local and global information across different channel branches through Mamba and balances feature contributions through a dynamic gating mechanism, improving the accuracy of lesion region recognition. The multi-scale adaptive fusion (MAF) module combines HAM, convolution block, and cascaded attention mechanisms to achieve efficient fusion of lesion features at different scales, thereby enhancing the model’s robustness and precision. To address the feature alignment issue, we propose adaptive channel attention (ACA) and adaptive spatial attention (ASA) modules, where the former achieves channel enhancement through dual-scale pooling and the latter strengthens spatial representation using a dual-path convolution strategy. Extensive experiments on the BUSI, CVC-ClinicDB, and ISIC-2018 three public datasets show that MAFUNet achieves excellent performance in medical image segmentation tasks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105655"},"PeriodicalIF":4.2,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144672120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haotian Lei, Xiangyu Liu, Yan Zhou, Guo Niu, Changan Yi, Yuexia Zhou, Xiaofeng Liang, Fuhe Liu
{"title":"MMFEIR: Multi-attention Mutual Feature Enhance and Instance Reconstruction for category-level 6D object pose estimation","authors":"Haotian Lei, Xiangyu Liu, Yan Zhou, Guo Niu, Changan Yi, Yuexia Zhou, Xiaofeng Liang, Fuhe Liu","doi":"10.1016/j.imavis.2025.105657","DOIUrl":"10.1016/j.imavis.2025.105657","url":null,"abstract":"<div><div>Category-level 6D object pose estimation is a fundamental problem in fields such as robotic manipulation and augmented reality. The goal of this task is to predict the rotation, translation, and size of the object. Current research typically extracts the deformation field from observed point cloud of the object for estimating 6D pose. However, they did not fully consider the interaction between the observed point cloud, prior shape, and image of the object, resulting in the loss of geometric and texture features of the object, thereby affecting the accuracy of pose estimation for objects with large intra class configuration differences. In this paper, we propose a Multi-attention Mutual Feature Enhance Module (MMFEM) to enhance the inherent linkages among different perception data of objects. MMFEM enhances the interaction between images, observed point cloud, and prior shape through multiple attention modules. This enables the network to gain a deeper understanding of the differences between distinct instances. In addition, to improve the feature expression of geometric details for objects, we propose the Instance Reconstruction Deformation Module (IRDM). IRDM reconstructed the three-dimensional instance point cloud for each object, enhancing the model’s ability to identify differences in geometric configurations of objects. Extensive experiments on the CAMERA25 and REAL275 datasets show that the proposed methods have achieved 79.0% and 91.2% on the 3D75 metric, 52.6% and 75.9% on the 5°2 cm metric, respectively, outperforming current mainstream methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105657"},"PeriodicalIF":4.2,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144721875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serena Lembo , Paola Barra , Luigi Di Biasi , Thierry Bouwmans , Genoveffa Tortora
{"title":"AI4RDD: Artificial Intelligence and Rare Disease Diagnosis: A proposal to improve the anamnesis process","authors":"Serena Lembo , Paola Barra , Luigi Di Biasi , Thierry Bouwmans , Genoveffa Tortora","doi":"10.1016/j.imavis.2025.105658","DOIUrl":"10.1016/j.imavis.2025.105658","url":null,"abstract":"<div><div>Diagnosing rare and complex diseases presents significant challenges due to their inherent intricacies, limited data availability, and the need for highly skilled physicians. Traditional diagnostic processes use a decentralized approach in which patients often consult multiple specialists and visit various healthcare facilities to determine their condition. This conventional method frequently leads to delayed or inaccurate diagnoses. With over 10,000 rare diseases affecting more than 350 million people worldwide, the demand for innovative and effective diagnostic solutions is urgent and critical.</div><div>Artificial intelligence (AI) advancements present promising tools to tackle these challenges. AI-driven systems, such as Clinical Decision Support Systems (CDSS) and Computer-Aided Diagnosis Systems (CAD), facilitate complex medical data processing, integrating diverse datasets, including imaging and genomics, and supporting evidence-based treatment decisions. These technologies have the potential to enable earlier and more accurate diagnoses, reduce unnecessary tests, and enhance overall healthcare efficiency.</div><div>This study proposes a framework for an AI-based CAD tool that can lead to a Distributed Knowledge Model. This framework seeks to improve diagnostic precision and enhance global patient outcomes for rare diseases. This framework emphasizes ethical AI implementation for better data integration and expert collaboration.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105658"},"PeriodicalIF":4.2,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144749144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tichao Wang , Fusheng Hao , Qieshi Zhang , Jun Cheng
{"title":"Progressive background–foreground difference enhancement for few-shot 3D point cloud semantic segmentation","authors":"Tichao Wang , Fusheng Hao , Qieshi Zhang , Jun Cheng","doi":"10.1016/j.imavis.2025.105656","DOIUrl":"10.1016/j.imavis.2025.105656","url":null,"abstract":"<div><div>Few-shot 3D point cloud semantic segmentation aims to segment query point clouds given only few annotated support point clouds. Most existing methods focus on exploring the complex relationships between support data and query data within the prototype-based framework. However, the ignored background ambiguity issue, i.e., the foregrounds of a support class are treated as backgrounds by other support classes, severely limits few-shot models’ ability to distinguish foregrounds and backgrounds, resulting in biased prototypes. In this paper, we propose a progressive background–foreground difference enhancement method to eliminate background ambiguity. Firstly, based on the fact that the background ambiguity only affects background prototypes, we develop a background–foreground difference enhancement strategy, which eliminates background ambiguity via enhancing the difference between foregrounds and backgrounds in query data. Then, we present a geometric-guided feature aggregation module, which integrates geometrical information to improve the reliability of pseudo labels. Finally, we aggregate high-confidence query features as pseudo prototypes to refine the prototypes. The iteration of these steps further improves prototype quality. Comprehensive experiments suggest that our method achieves competing performance on both S3DIS and ScanNet datasets.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105656"},"PeriodicalIF":4.2,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144653240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harnessing consistency for improved test-time adaptation","authors":"Dahuin Jung","doi":"10.1016/j.imavis.2025.105650","DOIUrl":"10.1016/j.imavis.2025.105650","url":null,"abstract":"<div><div>Test-time adaptation (TTA) is crucial for adjusting pre-trained models to new, unseen test data distributions without ground-truth labels, thereby addressing domain shifts commonly encountered in real-world scenarios. The most widely adopted self-training strategies in TTA include either pseudo-labeling or the minimization of prediction entropy. Different from these approaches, some research in natural language processing explored the use of consistency as a self-training objective. However, the performance improvements via consistency maximization have been limited. Based on this finding, we present a novel approach that employs consistency not as a primary self-training objective but as a metric for effective sample weighting and filtering. Our method, Consistency-TTA (CTTA), enhances performance and computational efficiency by implementing a sample weighting method that prioritizes samples demonstrating robustness to perturbations, and a sample filtering method that restricts backward pass to samples that are less prone to error accumulation. Our CTTA, which can be orthogonally combined with various state-of-the-art baselines, demonstrates performance improvements in extended adaptation tasks such as multi-modal TTA for 3D semantic segmentation and video domain adaptation. We evaluated CTTA on various corruption and natural domain shift datasets, consistently demonstrating meaningful performance improvements. Moreover, CTTA proved to be effective in both classification tasks and semantic segmentation benchmarks, such as CarlaTTA, highlighting its versatility across extended TTA applications.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105650"},"PeriodicalIF":4.2,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144653239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yujie Wu , Hengliang Tan , Jiao Du , Shuo Yang , Guofeng Yan
{"title":"Deep Hybrid Manifold Network with joint metric learning for image set classification","authors":"Yujie Wu , Hengliang Tan , Jiao Du , Shuo Yang , Guofeng Yan","doi":"10.1016/j.imavis.2025.105647","DOIUrl":"10.1016/j.imavis.2025.105647","url":null,"abstract":"<div><div>Many studies have shown that complex visual data exhibit non-linear and non-Euclidean characteristics. How to find an intrinsic and low-dimensional representation for non-linear visual data is crucial for image set classification. Due to the powerful data interpretation of deep neural networks and the intrinsic structural exploitation of manifold learning, deep Riemannian neural networks have demonstrated excellent performance on solving the non-linear and non-Euclidean data. However, on the one hand, deep Riemannian neural networks usually focus on exploring the intrinsic structure of the single manifold, while complex visual data may contain multiple potential intrinsic structures. On the other hand, the single cross-entropy is usually adopted as the sole loss function, which may lose discriminative metric information. In this paper, we propose a deep Riemannian neural network by fusing Symmetric Positive Definite (SPD) and Grassmann manifolds to explore multiple intrinsic structures in complex visual data. We innovatively employ the Jensen–Bregman LogDet Divergence and Projection metric to construct two metric learning regularization terms over SPD and Grassmann manifold networks respectively, which capture the intra-class and inter-class data distributions. Subsequently, the regularization terms corresponding to different manifolds are jointly learned in conjunction with the cross-entropy loss function to fuse multiple loss information. Extensive experiments are conducted on expression recognition, gesture recognition, and action recognition tasks. Experimental results demonstrate the superior performance of the proposed Riemannian network.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105647"},"PeriodicalIF":4.2,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144666076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}