Image and Vision Computing最新文献

筛选
英文 中文
Advanced deep learning and large language models: Comprehensive insights for cancer detection
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-24 DOI: 10.1016/j.imavis.2025.105495
Yassine Habchi , Hamza Kheddar , Yassine Himeur , Adel Belouchrani , Erchin Serpedin , Fouad Khelifi , Muhammad E.H. Chowdhury
{"title":"Advanced deep learning and large language models: Comprehensive insights for cancer detection","authors":"Yassine Habchi ,&nbsp;Hamza Kheddar ,&nbsp;Yassine Himeur ,&nbsp;Adel Belouchrani ,&nbsp;Erchin Serpedin ,&nbsp;Fouad Khelifi ,&nbsp;Muhammad E.H. Chowdhury","doi":"10.1016/j.imavis.2025.105495","DOIUrl":"10.1016/j.imavis.2025.105495","url":null,"abstract":"<div><div>In recent years, the rapid advancement of machine learning (ML), particularly deep learning (DL), has revolutionized various fields, with healthcare being one of the most notable beneficiaries. DL has demonstrated exceptional capabilities in addressing complex medical challenges, including the early detection and diagnosis of cancer. Its superior performance, surpassing both traditional ML methods and human accuracy, has made it a critical tool in identifying and diagnosing diseases such as cancer. Despite the availability of numerous reviews on DL applications in healthcare, a comprehensive and detailed understanding of DL’s role in cancer detection remains lacking. Most existing studies focus on specific aspects of DL, leaving significant gaps in the broader knowledge base. This paper aims to bridge these gaps by offering a thorough review of advanced DL techniques, namely transfer learning (TL), reinforcement learning (RL), federated learning (FL), Transformers, and large language models (LLMs). These cutting-edge approaches are pushing the boundaries of cancer detection by enhancing model accuracy, addressing data scarcity, and enabling decentralized learning across institutions while maintaining data privacy. TL enables the adaptation of pre-trained models to new cancer datasets, significantly improving performance with limited labeled data. RL is emerging as a promising method for optimizing diagnostic pathways and treatment strategies, while FL ensures collaborative model development without sharing sensitive patient data. Furthermore, Transformers and LLMs, traditionally utilized in natural language processing (NLP), are now being applied to medical data for enhanced interpretability and context-based predictions. In addition, this review explores the efficiency of the aforementioned techniques in cancer diagnosis, it addresses key challenges such as data imbalance, and proposes potential solutions. It aims to be a valuable resource for researchers and practitioners, offering insights into current trends and guiding future research in the application of advanced DL techniques for cancer detection.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105495"},"PeriodicalIF":4.2,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143704454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-temporal information mining and fusion feature-guided modal alignment for video-based visible-infrared person re-identification
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-22 DOI: 10.1016/j.imavis.2025.105518
Zhigang Zuo, Huafeng Li, Yafei Zhang, Minghong Xie
{"title":"Spatio-temporal information mining and fusion feature-guided modal alignment for video-based visible-infrared person re-identification","authors":"Zhigang Zuo,&nbsp;Huafeng Li,&nbsp;Yafei Zhang,&nbsp;Minghong Xie","doi":"10.1016/j.imavis.2025.105518","DOIUrl":"10.1016/j.imavis.2025.105518","url":null,"abstract":"<div><div>The video-based visible-infrared person re-identification (Re-ID) aims to recognize the same person across modalities through video sequences. The core challenges of this task lie in narrowing the modal differences and deeply mining the rich spatio-temporal information contained in video to enhance model performance. However, existing research primarily focuses on addressing the modality gap, with insufficient utilization of the spatio-temporal information in video sequences. To address this, this paper proposes a novel spatio-temporal information mining and fusion feature-guided modal alignment framework for video-based visible-infrared person Re-ID. Specifically, we propose a spatio-temporal information mining method. This method employs the proposed feature correlation mechanism to enhance the discriminative features of person across different frames, while utilizing a temporal Transformer to mine person motion features. The advantage of this method lies in its ability to alleviate issues such as occlusion and frame misalignment, improving the discriminability of person features. Additionally, we introduce a fusion modality-guided modal alignment strategy, which reduces modality differences between infrared and visible video frames by aligning single-modality features with fusion features. The advantage of this strategy is that each modality not only learns its specific features but also absorbs person information from the other modality, thereby alleviating modality differences and further enhancing the discriminability of person features. Extensive comparative and ablation experiments conducted on the HITSZ-VCM and BUPTCampus datasets confirm the effectiveness and superiority of the proposed framework. The source code is available at <span><span>https://github.com/lhf12278/SIMFGA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105518"},"PeriodicalIF":4.2,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MFKD: Multi-dimensional feature alignment for knowledge distillation
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-22 DOI: 10.1016/j.imavis.2025.105514
Zhen Guo , Pengzhou Zhang , Peng Liang
{"title":"MFKD: Multi-dimensional feature alignment for knowledge distillation","authors":"Zhen Guo ,&nbsp;Pengzhou Zhang ,&nbsp;Peng Liang","doi":"10.1016/j.imavis.2025.105514","DOIUrl":"10.1016/j.imavis.2025.105514","url":null,"abstract":"<div><div>Knowledge distillation is a popular technique for compressing and transferring models in the field of deep learning. However, existing distillation methods often focus on optimizing a single dimension and overlook the importance of aligning and transforming knowledge across multiple dimensions, leading to suboptimal results. In this article, we introduce a novel approach called multi-dimensional feature alignment for knowledge distillation (MFKD) to address this limitation. The MFKD framework is built on the observation that knowledge from different dimensions can complement each other effectively. We extract knowledge from features in the spatcial, sample and channel dimensions separately. Our spatial-level part separates the foreground and background information, guiding the student to focus on crucial image regions by mimicking the teacher’s spatial and channel attention maps. Our sample-level part distills knowledge encoded in semantic correlations between sample activations by aligning the student’s activations to emulate the teacher’s clustering patterns using the Spearman correlation coefficient. Furthermore, our channel-level part encourages the student to learn standardized feature representations aligned with the teacher’s channel-wise interdependencies. Finally, we dynamically balance the loss factors of the different dimensions to optimize the overall performance of the distillation process. To validate the effectiveness of our methodology, we conduct experiments on benchmark datasets such as CIFAR-100, ImageNet and COCO. The experimental results demonstrate substantial performance improvements compared to baseline and recent state-of-the-art methods, confirming the efficacy of our MFKD framework. Furthermore, we provide a comprehensive analysis of the experimental results, offering deeper insight into the benefits and effectiveness of our approach. Through this analysis, we reinforce the significance of aligning and leveraging knowledge across multiple dimensions in knowledge distillation.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105514"},"PeriodicalIF":4.2,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention head purification: A new perspective to harness CLIP for domain generalization
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-22 DOI: 10.1016/j.imavis.2025.105511
Yingfan Wang, Guoliang Kang
{"title":"Attention head purification: A new perspective to harness CLIP for domain generalization","authors":"Yingfan Wang,&nbsp;Guoliang Kang","doi":"10.1016/j.imavis.2025.105511","DOIUrl":"10.1016/j.imavis.2025.105511","url":null,"abstract":"<div><div>Domain Generalization (DG) aims to learn a model from multiple source domains to achieve satisfactory performance on unseen target domains. Recent works introduce CLIP to DG tasks due to its superior image-text alignment and zeros-shot performance. Previous methods either utilize full fine-tuning or prompt-learning paradigms to harness CLIP for DG tasks. Those works focus on avoiding catastrophic forgetting of the original knowledge encoded in CLIP but ignore that the knowledge encoded in CLIP in nature may contain domain-specific cues that constrain its domain generalization performance. In this paper, we propose a new perspective to harness CLIP for DG, <em>i.e.,</em> attention head purification. We observe that different attention heads may encode different properties of an image and selecting heads appropriately may yield remarkable performance improvement across domains. Based on such observations, we purify the attention heads of CLIP from two levels, including <em>task-level purification</em> and <em>domain-level purification</em>. For task-level purification, we design head-aware LoRA to make each head more adapted to the task we considered. For domain-level purification, we perform head selection via a simple gating strategy. We utilize MMD loss to encourage masked head features to be more domain-invariant to emphasize more generalizable properties/heads. During training, we jointly perform task-level purification and domain-level purification. We conduct experiments on various representative DG benchmarks. Though simple, extensive experiments demonstrate that our method performs favorably against previous state-of-the-arts.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105511"},"PeriodicalIF":4.2,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143705249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MrgaNet: Multi-scale recursive gated aggregation network for tracheoscopy images
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-21 DOI: 10.1016/j.imavis.2025.105503
Ying Wang , Yun Tie , Dalong Zhang , Fenghui Liu , Lin Qi
{"title":"MrgaNet: Multi-scale recursive gated aggregation network for tracheoscopy images","authors":"Ying Wang ,&nbsp;Yun Tie ,&nbsp;Dalong Zhang ,&nbsp;Fenghui Liu ,&nbsp;Lin Qi","doi":"10.1016/j.imavis.2025.105503","DOIUrl":"10.1016/j.imavis.2025.105503","url":null,"abstract":"<div><div>Lung cancer is a potentially fatal disease worldwide, and improving the accuracy of diagnosis plays a key role in enhancing patient outcomes. In this study, we extended computer-aided work to the task of assisting tracheoscopy in predicting lung cancer subtypes. To solve the problem of information fusion in different spatial scales and channels, we proposed MrgaNet. The network enhances classification performance by expanding interactions from low to high orders, dynamically adjusting feature weights, and incorporating a channel competition operator for efficient feature selection. Our network achieved a precision of 0.87 in the endobronchial dataset. In addition, the accuracy of 89.25% and 96.76% was achieved in the Kvasir-v2 dataset and the Kvasir-Capsule dataset, respectively. The results demonstrate that MrgaNet achieves superior performance compared to existing excellent methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105503"},"PeriodicalIF":4.2,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143715155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Part-aware distillation and aggregation network for human parsing
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-21 DOI: 10.1016/j.imavis.2025.105504
Yuntian Lai, Yuxin Feng, Fan Zhou, Zhuo Su
{"title":"Part-aware distillation and aggregation network for human parsing","authors":"Yuntian Lai,&nbsp;Yuxin Feng,&nbsp;Fan Zhou,&nbsp;Zhuo Su","doi":"10.1016/j.imavis.2025.105504","DOIUrl":"10.1016/j.imavis.2025.105504","url":null,"abstract":"<div><div>The current state-of-the-art human parsing models achieve remarkable success in parsing accuracy. However, the huge model size and computational cost restrict their applications on low-latency online systems or resource-limited mobile devices. In this paper, we propose a novel part-aware distillation and aggregation network for human parsing, which can be applied to any human parsing model to achieve a good trade-off between accuracy and efficiency. We design the part key-point similarity distillation and the part distribution distillation to transfer the complex teacher model’s knowledge of part structural and spatial relationships to the lightweight student model, which can help the latter to better identify small parts and semantic boundaries, and to distinguish easily confused categories. Furthermore, the online model aggregation module is introduced in the later stages of training, which can mitigate noise from both the teacher and the labels to obtain smoother and more robust results. Extensive experiments and ablation studies on the large-scale popular human parsing datasets LIP, ATR and PASCAL-Person Part fully demonstrate that our method is accurate, lightweight and general.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105504"},"PeriodicalIF":4.2,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143739054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DDMCB: Open-world object detection empowered by Denoising Diffusion Models and Calibration Balance
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-19 DOI: 10.1016/j.imavis.2025.105508
Yangyang Huang, Xing Xi, Ronghua Luo
{"title":"DDMCB: Open-world object detection empowered by Denoising Diffusion Models and Calibration Balance","authors":"Yangyang Huang,&nbsp;Xing Xi,&nbsp;Ronghua Luo","doi":"10.1016/j.imavis.2025.105508","DOIUrl":"10.1016/j.imavis.2025.105508","url":null,"abstract":"<div><div>Open-world object detection (OWOD) differs from traditional object detection by being more suited to real-world, dynamic scenarios. It aims to recognize unseen objects and have the skill to learn incrementally based on newly introduced knowledge. However, the current OWOD usually relies on supervising of known objects in identifying unknown objects, using high objectness scores as critical indicators of potential unknown objects. While these methods can detect unknown objects with features similar to known objects, they also classify regions dissimilar to known objects as background, leading to label bias issues. To address this problem, we leverage the knowledge from large visual models to provide auxiliary supervision for unknown objects. Additionally, we apply the Denoising Diffusion Probabilistic Model (DDPM) in OWOD scenarios. We propose an unsupervised modeling approach based on DDPM, which significantly improves the accuracy of unknown object detection. Despite this, the classifier trained during the model training process only encounters known classes, resulting in higher confidence for known classes during inference; thus, bias issues again occur. Therefore, we propose a probability calibration technique for post-processing predictions during inference. The calibration aims to reduce the probabilities of known objects and increase the probabilities of unknown objects, thereby balancing the final probability predictions. Our experiments demonstrate that the proposed method achieves significant improvements on OWOD benchmarks, with an unknown objects detection recall rate of <strong>54.7 U-Recall</strong>, surpassing the current state-of-the-art (SOTA) methods by <strong>44.3%</strong>. In terms of real-time performance, Our model uses a few parameters, and pure convolutional neural networks instead of intensive attention mechanisms, achieving an inference speed of <strong>35.04 FPS</strong>, exceeding the SOTA OWOD methods based on Faster R-CNN and Deformable DETR by <strong>2.79</strong> and <strong>10.95 FPS</strong>, respectively.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105508"},"PeriodicalIF":4.2,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised monocular depth learning from unknown cameras: Leveraging the power of raw data
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-19 DOI: 10.1016/j.imavis.2025.105505
Xiaofei Qin , Yongchao Zhu , Lin Wang , Xuedian Zhang , Changxiang He , Qiulei Dong
{"title":"Self-supervised monocular depth learning from unknown cameras: Leveraging the power of raw data","authors":"Xiaofei Qin ,&nbsp;Yongchao Zhu ,&nbsp;Lin Wang ,&nbsp;Xuedian Zhang ,&nbsp;Changxiang He ,&nbsp;Qiulei Dong","doi":"10.1016/j.imavis.2025.105505","DOIUrl":"10.1016/j.imavis.2025.105505","url":null,"abstract":"<div><div>Self-supervised monocular depth estimation from wild videos with unknown camera intrinsics is a practical and challenging task in computer vision. Most of the existing methods in literature employed a camera decoder and a pose decoder to estimate camera intrinsics and poses respectively, however, their performances would be degraded significantly in many complex scenarios with severe noise and large camera rotations. To address this problem, we propose a novel self-supervised monocular depth estimation method, which could be trained from wild videos with a joint optimization strategy for simultaneously estimating camera intrinsics and poses. In the proposed method, a depth encoder is employed to learn scene depth features, and then by taking these features as inputs, a Neighborhood Influence Module (NIM) is designed for predicting each pixel’s depth by fusing the depths of its neighboring pixels, which could explicitly enforce the depth accuracy. In addition, a knowledge distillation mechanism is introduced to learn a lightweight depth encoder from a large-scale depth encoder, for achieving a balance between computational speed and accuracy. Experimental results on four public datasets demonstrate that the proposed method outperforms some state-of-the-art methods in most cases. Moreover, once the proposed method is trained with a mixed set of different datasets, its performance would be further boosted in comparison to the proposed method trained with each involved single dataset. Codes are available at: <span><span>https://github.com/ZhuYongChaoUSST/IntrLessMonoDepth</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105505"},"PeriodicalIF":4.2,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rif-Diff: Improving image fusion based on diffusion model via residual prediction
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-19 DOI: 10.1016/j.imavis.2025.105494
Peixuan Wu, Shen Yang, Jin Wu, Qian Li
{"title":"Rif-Diff: Improving image fusion based on diffusion model via residual prediction","authors":"Peixuan Wu,&nbsp;Shen Yang,&nbsp;Jin Wu,&nbsp;Qian Li","doi":"10.1016/j.imavis.2025.105494","DOIUrl":"10.1016/j.imavis.2025.105494","url":null,"abstract":"<div><div>This paper proposes an image fusion framework Rif-Diff, which adopts several strategies and approaches to improve current fusion methods based on diffusion model. Rif-Diff employs residual images as the generation target of the diffusion model to optimize the model’s convergence process and enhance the fusion performance. For fusion tasks lacking ground truth, image fusion prior is utilized to facilitate the production of residual images. Simultaneously, to overcome the limitations of the model’s learning capacity imposed by training with image fusion prior, Rif-Diff introduces the idea of image restoration to enable the initial fused images to incorporate more expected information. Additionally, a dual-step decision module is designed to address the blurriness issue of fused images in existing multi-focus image fusion methods that do not rely on decision maps. Extensive experiments demonstrate the effectiveness of Rif-Diff across multiple fusion tasks including multi-focus image fusion, multi-exposure image fusion, and infrared-visible image fusion. The code is available at: <span><span>https://github.com/peixuanWu/Rif-Diff</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105494"},"PeriodicalIF":4.2,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking Active Domain Adaptation: Balancing Uncertainty and Diversity
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-17 DOI: 10.1016/j.imavis.2025.105492
Qing Tian , Yanzhi Li , Jiangsen Yu , Junyu Shen , Weihua Ou
{"title":"Rethinking Active Domain Adaptation: Balancing Uncertainty and Diversity","authors":"Qing Tian ,&nbsp;Yanzhi Li ,&nbsp;Jiangsen Yu ,&nbsp;Junyu Shen ,&nbsp;Weihua Ou","doi":"10.1016/j.imavis.2025.105492","DOIUrl":"10.1016/j.imavis.2025.105492","url":null,"abstract":"<div><div>In applications of machine learning, usually the test data domain distributes inconsistently with the model training data, implying they are not independent and identically distributed. To address this challenge with certain annotation knowledge, the paradigm of Active Domain Adaptation (ADA) has been proposed through selectively labeling some target instances to facilitate cross-domain alignment with minimal annotation cost. However, existing ADA methods often struggle to balance uncertainty and diversity in sample selection, limiting their effectiveness. To address this, we propose a novel ADA framework: Balancing Uncertainty and Diversity (ADA-BUD), which desirably achieves ADA while balancing the data uncertainty and diversity across domains. Specifically, in ADA-BUD, the Uncertainty Range Perception (URA) module is specially designed to distinguish these most informative but uncertain target instances for annotation while appraising not only each instance itself but also their neighbors. Subsequently, the module called Representative Energy Optimization (REO) is constructed to refine diversity of the resulting annotation instances set. Last but not least, to enhance the flexibility of ADA-BUD in handling scenarios with limited data, we further build the Dynamic Sample Enhancement (DSE) module in ADA-BUD to generate class-balanced label-confident data augmentation. Experiments show ADA-BUD outperforms existing methods on challenging benchmarks, demonstrating its practical potential.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105492"},"PeriodicalIF":4.2,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143739058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信