Qing Tian , Yanzhi Li , Jiangsen Yu , Junyu Shen , Weihua Ou
{"title":"Rethinking Active Domain Adaptation: Balancing Uncertainty and Diversity","authors":"Qing Tian , Yanzhi Li , Jiangsen Yu , Junyu Shen , Weihua Ou","doi":"10.1016/j.imavis.2025.105492","DOIUrl":"10.1016/j.imavis.2025.105492","url":null,"abstract":"<div><div>In applications of machine learning, usually the test data domain distributes inconsistently with the model training data, implying they are not independent and identically distributed. To address this challenge with certain annotation knowledge, the paradigm of Active Domain Adaptation (ADA) has been proposed through selectively labeling some target instances to facilitate cross-domain alignment with minimal annotation cost. However, existing ADA methods often struggle to balance uncertainty and diversity in sample selection, limiting their effectiveness. To address this, we propose a novel ADA framework: Balancing Uncertainty and Diversity (ADA-BUD), which desirably achieves ADA while balancing the data uncertainty and diversity across domains. Specifically, in ADA-BUD, the Uncertainty Range Perception (URA) module is specially designed to distinguish these most informative but uncertain target instances for annotation while appraising not only each instance itself but also their neighbors. Subsequently, the module called Representative Energy Optimization (REO) is constructed to refine diversity of the resulting annotation instances set. Last but not least, to enhance the flexibility of ADA-BUD in handling scenarios with limited data, we further build the Dynamic Sample Enhancement (DSE) module in ADA-BUD to generate class-balanced label-confident data augmentation. Experiments show ADA-BUD outperforms existing methods on challenging benchmarks, demonstrating its practical potential.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105492"},"PeriodicalIF":4.2,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143739058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Strengthening incomplete multi-view clustering: An attention contrastive learning method","authors":"Shudong Hou, Lanlan Guo, Xu Wei","doi":"10.1016/j.imavis.2025.105493","DOIUrl":"10.1016/j.imavis.2025.105493","url":null,"abstract":"<div><div>Incomplete multi-view clustering presents greater challenges than traditional multi-view clustering. In recent years, significant progress has been made in this field, multi-view clustering relies on the consistency and integrity of views to ensure the accurate transmission of data information. However, during the process of data collection and transmission, data loss is inevitable, leading to partial view loss and increasing the difficulty of joint learning on incomplete multi-view data. To address this issue, we propose a multi-view contrastive learning framework based on the attention mechanism. Previous contrastive learning mainly focused on the relationships between isolated sample pairs, which limited the robustness of the method. Our method selects positive samples from both global and local perspectives by utilizing the nearest neighbor graph to maximize the correlation between local features and latent features of each view. Additionally, we use a cross-view encoder network with self-attention structure to fuse the low dimensional representations of each view into a joint representation, and guide the learning of the joint representation through a high confidence structure. Furthermore, we introduce graph constraint learning to explore potential neighbor relationships among instances to facilitate data reconstruction. The experimental results on six multi-view datasets demonstrate that our method exhibits significant effectiveness and superiority compared to existing methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105493"},"PeriodicalIF":4.2,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-modal Few-shot Image Recognition with enhanced semantic and visual integration","authors":"Chunru Dong, Lizhen Wang, Feng Zhang, Qiang Hua","doi":"10.1016/j.imavis.2025.105490","DOIUrl":"10.1016/j.imavis.2025.105490","url":null,"abstract":"<div><div>Few-Shot Learning (FSL) enables models to recognize new classes with only a few examples by leveraging knowledge from known classes. Although some methods incorporate class names as prior knowledge, effectively integrating visual and semantic information remains challenging. Additionally, conventional similarity measurement techniques often result in information loss, obscure distinctions between samples, and fail to capture intra-sample diversity. To address these issues, this paper presents a Multi-modal Few-shot Image Recognition (MFSIR) approach. We first introduce the Multi-Scale Interaction Module (MSIM), which facilitates multi-scale interactions between semantic and visual features, significantly enhancing the representation of visual features. We also propose the Hybrid Similarity Measurement Module (HSMM), which integrates information from multiple dimensions to evaluate the similarity between samples by dynamically adjusting the weights of various similarity measurement methods, thereby improving the accuracy and robustness of similarity assessments. Experimental results demonstrate that our approach significantly outperforms existing methods on four FSL benchmarks, with marked improvements in FSL accuracy under 1-shot and 5-shot scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105490"},"PeriodicalIF":4.2,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object tracking based on temporal and spatial context information","authors":"Yan Chen, Tao Lin, Jixiang Du, Hongbo Zhang","doi":"10.1016/j.imavis.2025.105488","DOIUrl":"10.1016/j.imavis.2025.105488","url":null,"abstract":"<div><div>Currently, numerous advanced trackers improve stability by optimizing the target visual appearance models or by improving interactions between templates and search areas. Despite these advancements, appearance-based trackers still primarily depend on the visual information of targets without adequately integrating spatio-temporal context information, thus limiting their effectiveness in handling similar objects around the target. To address this challenge, a novel object tracking method, TSCTrack, which leverages spatio-temporal context information, has been introduced. TSCTrack overcomes the shortcomings of traditional center-cropping preprocessing techniques by introducing Global Spatial Position Embedding, effectively preserving spatial information and capturing motion data of targets. Additionally, TSCTrack incorporates a Spatial Relationship Aggregation module and a Temporal Relationship Aggregation module—the former captures static spatial context information per frame, while the latter integrates dynamic temporal context information. This sophisticated integration allows the Dynamic Tracking Prediction module to generate precise target coordinates effectively, greatly reducing the impact of target deformations and scale changes on tracking performance. Demonstrated across multiple public tracking datasets including LaSOT, TrackingNet, UAV123, GOT-10k, and OTB, TSCTrack showcases superior performance and validates its exceptional tracking capabilities in diverse scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105488"},"PeriodicalIF":4.2,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nasir Rahim , Naveed Ahmad , Waseem Ullah , Jatin Bedi , Younhyun Jung
{"title":"Early progression detection from MCI to AD using multi-view MRI for enhanced assisted living","authors":"Nasir Rahim , Naveed Ahmad , Waseem Ullah , Jatin Bedi , Younhyun Jung","doi":"10.1016/j.imavis.2025.105491","DOIUrl":"10.1016/j.imavis.2025.105491","url":null,"abstract":"<div><div>Alzheimer's disease (AD) is a progressive neurodegenerative disorder. Early detection is crucial for timely intervention and treatment to improve assisted living. Although magnetic resonance imaging (MRI) is a widely used neuroimaging modality for the diagnosis of AD, most studies focus on a single MRI plane, missing comprehensive spatial information. In this study, we proposed a novel approach that leverages multiple MRI planes (axial, coronal, and sagittal) from 3D MRI volumes to predict progression from stable mild cognitive impairment (sMCI) to progressive MCI (pMCI) and AD. We employed a list of convolutional neural networks, including EfficientNet-B7, ConvNext, and DenseNet-121, to extract deep features from each MRI plane, followed by a feature enhancement step through an attention module. The optimized feature set was then passed through a Bayesian-optimized pool of classification heads (i.e., multilayer perceptron (MLP), long short-term memory (LSTM), and multi-head attention (MHA)) to obtain the most effective model for each MRI plane. The optimal model for each MRI plane was then integrated into homogeneous and heterogeneous ensembles to further enhance the performance of the model. Using the ADNI dataset, the proposed model achieved 91% accuracy, 87% sensitivity, 88% specificity, and 92% AUC. To enhance the interpretability of the model, we used the Grad-CAM explainability technique to generate attention maps for each MRI plane, which identified critical brain regions affected by disease progression. These attention maps revealed consistent patterns of tissue damage across the MRI scans. The results demonstrate the effectiveness of combining multiplane MRI data with ensemble learning and attention mechanisms to improve the early detection and tracking of AD progression in patients with MCI, offering a more comprehensive diagnostic tool and enhanced clinical decision-making. The datasets, results, and code used to conduct the comprehensive analysis are made available to the research community through the following link: <span><span><em>https://github.com/nasir3843/Early_Progression_detection_MCI-to_AD</em></span><svg><path></path></svg></span></div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105491"},"PeriodicalIF":4.2,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143619432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingyuan Ma , Tianyou Chen , Jin Xiao , Xiaoguang Hu , Yingxun Wang
{"title":"An edge-aware high-resolution framework for camouflaged object detection","authors":"Jingyuan Ma , Tianyou Chen , Jin Xiao , Xiaoguang Hu , Yingxun Wang","doi":"10.1016/j.imavis.2025.105487","DOIUrl":"10.1016/j.imavis.2025.105487","url":null,"abstract":"<div><div>Camouflaged objects are often seamlessly assimilated into their surroundings and exhibit indistinct boundaries. The complex environmental conditions and the high intrinsic similarity between camouflaged targets and their backgrounds present significant challenges in accurately locating and fully segmenting these objects. Although existing methods have achieved remarkable performance across various real-world scenarios, they still struggle with challenging cases such as small targets, thin structures, and blurred boundaries. To address these issues, we propose a novel edge-aware high-resolution network. Specifically, we design a High-Resolution Feature Enhancement Module to exploit multi-scale features while preserving local details. Furthermore, we introduce an Edge Prediction Module to generate high-quality edge prediction maps. Subsequently, we develop an Attention-Guided Fusion Module to effectively leverage the edge prediction maps. With these key modules, the proposed model achieves real-time performance at 58 FPS and surpasses 21 state-of-the-art algorithms across six standard evaluation metrics. Source code will be publicly available at <span><span>https://github.com/clelouch/EHNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105487"},"PeriodicalIF":4.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenxin Wang , Boyun Li , Wanli Liu , Xi Peng , Yuanbiao Gou
{"title":"MUNet: A lightweight Mamba-based Under-Display Camera restoration network","authors":"Wenxin Wang , Boyun Li , Wanli Liu , Xi Peng , Yuanbiao Gou","doi":"10.1016/j.imavis.2025.105486","DOIUrl":"10.1016/j.imavis.2025.105486","url":null,"abstract":"<div><div>Under-Display Camera (UDC) restoration aims to recover the underlying clean images from the degraded images captured by UDC. Although promising results have been achieved, most existing UDC restoration methods still suffer from two vital obstacles in practice: (i) existing UDC restoration models are parameter-intensive, and (ii) most of them struggle to capture long-range dependencies within high-resolution images. To overcome above drawbacks, we study a challenging problem in UDC restoration, namely, how to design a lightweight UDC restoration model that could capture long-range image dependencies. To this end, we propose a novel lightweight Mamba-based UDC restoration network (MUNet) consisting of two modules, named Separate Multi-scale Mamba (SMM) and Separate Convolutional Feature Extractor (SCFE). Specifically, SMM exploits our proposed alternate scanning strategy to efficiently capture long-range dependencies across multi-scale image features. SCFE preserves local dependencies through convolutions with various receptive fields. Thanks to SMM and SCFE, MUNet achieves state-of-the-art lightweight UDC restoration performance with significantly fewer parameters, making it well-suited for deployment on mobile devices. Our codes will be available after acceptance.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105486"},"PeriodicalIF":4.2,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143577294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Han , Nan Li , Zeyuan Zhong , Dong Niu , Bingbing Gao
{"title":"Adaptive scale matching for remote sensing object detection based on aerial images","authors":"Lu Han , Nan Li , Zeyuan Zhong , Dong Niu , Bingbing Gao","doi":"10.1016/j.imavis.2025.105482","DOIUrl":"10.1016/j.imavis.2025.105482","url":null,"abstract":"<div><div>Remote sensing object detection based on aerial images presents challenges due to their complex backgrounds, and the utilization of specific a contextual information can enhance detection accuracy. Inadequate long-range background information may lead to erroneous detection of small remotely sensed objects, with variations in background complexity observed across different object types. In this paper, we propose a new <strong>YOLO</strong>-based real-time object detector. The detector aims to <strong>S</strong>cale-<strong>M</strong>atch the proportions of various objects in remote sensing images using the model named <strong>YOLO-SM</strong>. Specifically, this paper proposes a straightforward yet highly efficient building block that dynamically adjusts the necessary receptive field for each object, minimizing the loss of feature information caused by consecutive convolutions. Additionally, a supplementary bottom-up pathway is incorporated to improve the representation of smaller objects. Empirical evaluations conducted on DOTA-v1.0, DOTA-v1.5, DIOR-R, and HRSC2016 datasets confirm the efficacy of the proposed methodology. On DOTA-v1.0, compared to RTMDet-R-L, YOLO-SM-S achieved competitive accuracy while significantly reducing parameters by 74.8% and FLOPs by 78.5%. Compared to LSKNet on HRSC2016, YOLO-SM-Tiny dramatically reduces 76% of parameters and 90% of FLOPs and improves FPS by about three times while maintaining stable accuracy.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105482"},"PeriodicalIF":4.2,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143619431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Jiang , Maoyu Liao , Yun Zhao , Gen Li , Siyu Cheng , Xiangkai Wang , Qingling Xia
{"title":"Deep learning for brain tumor segmentation in multimodal MRI images: A review of methods and advances","authors":"Bin Jiang , Maoyu Liao , Yun Zhao , Gen Li , Siyu Cheng , Xiangkai Wang , Qingling Xia","doi":"10.1016/j.imavis.2025.105463","DOIUrl":"10.1016/j.imavis.2025.105463","url":null,"abstract":"<div><h3>Background and Objectives:</h3><div>Image segmentation is crucial in applications like image understanding, feature extraction, and analysis. The rapid development of deep learning techniques in recent years has significantly enhanced the field of medical image processing, with the process of segmenting tumor from MRI images of the brain emerging as a particularly active area of interest within the medical science community. Existing reviews predominantly focus on traditional CNNs and Transformer models but lack systematic analysis and experimental validation on the application of the emerging Mamba architecture in multimodal brain tumor segmentation, the handling of missing modalities, the potential of multimodal fusion strategies, and the heterogeneity of datasets.</div></div><div><h3>Methods:</h3><div>This paper provides a comprehensive literature review of recent deep learning-based methods for multimodal brain tumor segmentation using multimodal MRI images, including performance and quantitative analysis of state-of-the-art approaches. It focuses on the handling of multimodal fusion, adaptation techniques, and missing modality, while also delving into the performance, advantages, and disadvantages of deep learning models such as U-Net, Transformer, hybrid deep learning, and Mamba-based methods in segmentation tasks.</div></div><div><h3>Results:</h3><div>Through the entire review process, It is found that most researchers preferred to use the Transformer-based U-Net model and mamba-based U-Net, especially the fusion model combination of U-Net and mamba, for image segmentation.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105463"},"PeriodicalIF":4.2,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng Lu , Yangming Guo , Jiang Long , Zun Liu , Zhuqing Wang , Ying Li
{"title":"Dense small target detection algorithm for UAV aerial imagery","authors":"Sheng Lu , Yangming Guo , Jiang Long , Zun Liu , Zhuqing Wang , Ying Li","doi":"10.1016/j.imavis.2025.105485","DOIUrl":"10.1016/j.imavis.2025.105485","url":null,"abstract":"<div><div>Unmanned aerial vehicle (UAV) aerial images make dense small target detection challenging due to the complex background, small object size in the wide field of view, low resolution, and dense target distribution. Many aerial target detection networks and attention-based methods have been proposed to enhance the capability of dense small target detection, but there are still problems, such as insufficient effective information extraction, missed detection, and false detection of small targets in dense areas. Therefore, this paper proposes a novel dense small target detection algorithm (DSTDA) for UAV aerial images suitable for various high-altitude complex environments. The core component of the proposed DSTDA consists of the multi-axis attention units, the adaptive feature transformation mechanism, and the target-guided sample allocation strategy. Firstly, by introducing the multi-axis attention units into DSTDA, the limitation of DSTDA on global information perception can be addressed. Thus, the detailed features and spatial relationships of small targets at long distances can be sufficiently extracted by our proposed algorithm. Secondly, an adaptive feature transformation mechanism is designed to flexibly adjust the feature map according to the characteristics of the target distribution, which enables the DSTDA to focus more on densely populated target areas. Lastly, a goal-oriented sample allocation strategy is presented, combining coarse screening based on positional information and fine screening guided by target prediction information. By employing this dynamic sample allocation from coarse to fine, the detection performance of small and dense targets in complex backgrounds is further improved. These above innovative improvements empower the DSTDA with enhanced global perception and target-focusing capabilities, effectively addressing the challenges of detecting dense small targets in complex aerial scenes. Experimental validation was conducted on three publicly available datasets: VisDrone, SIMD, and CARPK. The results showed that the proposed DSTDA outperforms other state-of-the-art algorithms in terms of comprehensive performance. The algorithm significantly improves the issues of false alarms and missed detection in drone-based target detection, showcasing remarkable accuracy and real-time performance. It proves to be proficient in the task of detecting dense small targets in drone scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105485"},"PeriodicalIF":4.2,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}