Computer Vision and Image Understanding最新文献

筛选
英文 中文
Effects of smart walker and augmented reality on gait parameters of a patient with spinocerebellar ataxia: Case report 智能步行者和增强现实对脊髓小脑性共济失调患者步态参数的影响:病例报告
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-07-14 DOI: 10.1016/j.cviu.2025.104446
Matheus Loureiro , Janine Valentino , Weslley Oliveira , Fabiana Machado , Arlindo Elias , Ricardo Mello , Arnaldo Leal , Anselmo Frizera
{"title":"Effects of smart walker and augmented reality on gait parameters of a patient with spinocerebellar ataxia: Case report","authors":"Matheus Loureiro ,&nbsp;Janine Valentino ,&nbsp;Weslley Oliveira ,&nbsp;Fabiana Machado ,&nbsp;Arlindo Elias ,&nbsp;Ricardo Mello ,&nbsp;Arnaldo Leal ,&nbsp;Anselmo Frizera","doi":"10.1016/j.cviu.2025.104446","DOIUrl":"10.1016/j.cviu.2025.104446","url":null,"abstract":"<div><div>Ataxia is a neurological condition that impairs mobility and independence in daily activities. To mitigate the symptoms, patients often seek physical therapy interventions. However, these therapies can be challenging for some individuals, depending on their level of independence, and patients may experience pain and frustration due to repetitive tasks. To address these limitations, rehabilitation robots, such as the Smart Walker (SW), can be tailored to an individual’ s degree of independence, while Augmented Reality (AR) systems can enhance patient engagement and motivation. However, the use of AR may also lead to adverse effects, such as restrictions in gait patterns and the potential of cybersickness symptoms. In this context, this paper presents a case report of a patient with ataxia to evaluate the effects of the SW and AR in three tasks: Physiotherapist-Assisted Gait (PAG), Walker-Assisted Gait (WAG), and Augmented Reality Walker-Assisted Gait (ARWAG). The results show that the use of the SW in WAG led to improvements in gait parameters, including a 27% increase in step length and a 19% increase in hip excursion in the sagittal plane. In ARWAG, these improvements were even greater, with a 58% increase in step length and a 43% increase in hip excursion in the sagittal plane. No cybersickness symptoms were observed during the ARWAG. Additionally, among all tasks, the patient expressed a preference for the ARWAG, indicating that the combination of SW and AR holds potential benefits for assisting ataxia patients in physical therapy interventions.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104446"},"PeriodicalIF":4.3,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144631710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Style transfer with diffusion models for synthetic-to-real domain adaptation 基于扩散模型的风格迁移,用于综合到实际领域的适应
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-07-14 DOI: 10.1016/j.cviu.2025.104445
Estelle Chigot , Dennis G. Wilson , Meriem Ghrib , Thomas Oberlin
{"title":"Style transfer with diffusion models for synthetic-to-real domain adaptation","authors":"Estelle Chigot ,&nbsp;Dennis G. Wilson ,&nbsp;Meriem Ghrib ,&nbsp;Thomas Oberlin","doi":"10.1016/j.cviu.2025.104445","DOIUrl":"10.1016/j.cviu.2025.104445","url":null,"abstract":"<div><div>Semantic segmentation models trained on synthetic data often perform poorly on real-world images due to domain gaps, particularly in adverse conditions where labeled data is scarce. Yet, recent foundation models enable to generate realistic images without any training. This paper proposes to leverage such diffusion models to improve the performance of vision models when learned on synthetic data. We introduce two novel techniques for semantically consistent style transfer using diffusion models: Class-wise Adaptive Instance Normalization and Cross-Attention (<span><math><mtext>CACTI</mtext></math></span>) and its extension with selective attention Filtering (<span><math><msub><mrow><mtext>CACTI</mtext></mrow><mrow><mtext>F</mtext></mrow></msub></math></span>). <span><math><mtext>CACTI</mtext></math></span> applies statistical normalization selectively based on semantic classes, while <span><math><msub><mrow><mtext>CACTI</mtext></mrow><mrow><mtext>F</mtext></mrow></msub></math></span> further filters cross-attention maps based on feature similarity, preventing artifacts in regions with weak cross-attention correspondences. Our methods transfer style characteristics while preserving semantic boundaries and structural coherence, unlike approaches that apply global transformations or generate content without constraints. Experiments using GTA5 as source and Cityscapes/ACDC as target domains show that our approach produces higher quality images with lower FID scores and better content preservation. Our work demonstrates that class-aware diffusion-based style transfer effectively bridges the synthetic-to-real domain gap even with minimal target domain data, advancing robust perception systems for challenging real-world applications. The source code is available at: <span><span>https://github.com/echigot/cactif</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104445"},"PeriodicalIF":4.3,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144655382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
S2DNet: A self-supervised deraining network using monocular videos S2DNet:一个使用单目视频的自监督训练网络
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-07-10 DOI: 10.1016/j.cviu.2025.104444
Aman Kumar, Aditya Mohan, A.N. Rajagopalan
{"title":"S2DNet: A self-supervised deraining network using monocular videos","authors":"Aman Kumar,&nbsp;Aditya Mohan,&nbsp;A.N. Rajagopalan","doi":"10.1016/j.cviu.2025.104444","DOIUrl":"10.1016/j.cviu.2025.104444","url":null,"abstract":"<div><div>Rainy conditions degrade the visual quality of images, thus presenting significant challenges for various vision-based downstream tasks. Traditional deraining approaches often rely on supervised learning methods requiring large, paired datasets of rainy and clean images. However, due to the dynamic and complex nature of rain, compiling such datasets is challenging and often insufficient for training robust models. As a result, researchers often resort to synthetic datasets. However, synthetic datasets have limitations because they often lack realism, can introduce biases, and seldom capture the diversity of real rain scenes. We propose a self-supervised method for image deraining using monocular videos that leverages the fact that rain moves spatially across frames, independently of the static elements in a scene, thus enabling isolation of rain-affected regions. We utilize depth information from the target frame and the camera’s relative pose (translations and rotations) across frames to achieve scene alignment. We apply a view-synthesis constraint that warps features from adjacent frames to the target frame, which enables us to generate pseudo-ground truth images by selecting clean pixels from the warped frame. The pseudo-clean images thus generated are effectively leveraged by our network to remove rain from images in a self-supervised manner without the need for a real rain paired dataset which is difficult to capture. Extensive evaluations on diverse real-world rainy datasets demonstrate that our approach achieves state-of-the-art performance in real image deraining, outperforming existing unsupervised methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104444"},"PeriodicalIF":4.3,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144604598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedVLP: Visual-aware latent prompt generation for Multimodal Federated Learning 多模态联邦学习的视觉感知潜在提示生成
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-07-08 DOI: 10.1016/j.cviu.2025.104442
Hao Pan , Xiaoli Zhao , Yuchen Jiang , Lipeng He , Bingquan Wang , Yincan Shu
{"title":"FedVLP: Visual-aware latent prompt generation for Multimodal Federated Learning","authors":"Hao Pan ,&nbsp;Xiaoli Zhao ,&nbsp;Yuchen Jiang ,&nbsp;Lipeng He ,&nbsp;Bingquan Wang ,&nbsp;Yincan Shu","doi":"10.1016/j.cviu.2025.104442","DOIUrl":"10.1016/j.cviu.2025.104442","url":null,"abstract":"<div><div>Recent studies indicate that prompt learning based on CLIP-like models excels in a variety of image recognition and detection tasks, consequently, it has been applied in Multimodal Federated Learning (MMFL). Federated Prompt Learning (FPL), as a technical branch of MMFL, enables clients and servers to exchange prompts rather than model parameters during communication to address challenges such as data heterogeneity and high training costs. Many existing FPL methods rely heavily on pre-trained visual-language models, making it difficult for them to handle new and real specialized domain data. To further boost the generalization ability of FPL without compromising the personalization of clients, we propose a novel framework that generates prompts guided by visual semantics to better handle specialized and small-scale data. In our approach, each client generates visual-aware latent prompts using a Fusion Encoder and an IE-Module, enabling the learning of fine-grained knowledge. Through federated computation, clients collaboratively maintain a global prompt, allowing the learning of coarse-grained knowledge. FedVLP removes the dependency on manually designed prompt templates and demonstrates superior performance across seven datasets, including CIFAR-10, CIFAR-100, Caltech-101, FLIndustry-100, and others.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104442"},"PeriodicalIF":4.3,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144595957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PDCNet: A lightweight and efficient robotic grasp detection framework via Partial Convolution and knowledge distillation PDCNet:一个基于部分卷积和知识蒸馏的轻量级、高效的机器人抓取检测框架
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-07-07 DOI: 10.1016/j.cviu.2025.104441
Yanshu Jiang, Yanze Fang, Liwei Deng
{"title":"PDCNet: A lightweight and efficient robotic grasp detection framework via Partial Convolution and knowledge distillation","authors":"Yanshu Jiang,&nbsp;Yanze Fang,&nbsp;Liwei Deng","doi":"10.1016/j.cviu.2025.104441","DOIUrl":"10.1016/j.cviu.2025.104441","url":null,"abstract":"<div><div>Improving detection accuracy complicates robotic grasp models, which makes deploying them on resource-constrained edge AI devices more challenging. Although various lightweight strategies have been proposed, directly designing compact networks may not be optimal, as balancing accuracy and model size is challenging. This paper proposes a lightweight grasp detection framework, PDCNet. In response to this problem, we optimize the interplay between computational demands and detection performance. The method integrates Partial Convolution (PConv) for efficient feature extraction, Discrete Wavelet Transform (DWT) for enhancing frequency-domain feature representation, and a Cross-Stage Fusion (CSF) strategy for optimizing the utilization of multi-scale features. A Quality-Enhanced Huber Loss Function (Q-Huber) is also introduced to improve the network’s sensitivity to vital grasp localities. Finally, the teacher–student framework distills expertise into a compact student model. Comprehensive evaluations were conducted using the public datasets to demonstrate that PDCNet achieves detection accuracies of 98.7%, 95.8%, and 97.1% on Cornell, Jacquard and Jacquard_V2 datasets respectively, while maintaining minimal parameters and high computational efficiency. Real-world experiments on an embedded edge AI device further validate the capability of PDCNet to perform accurate grasp detection under limited computational resources.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104441"},"PeriodicalIF":4.3,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144604599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A method for absolute pose regression based on cascaded attention modules 基于级联注意模块的绝对姿态回归方法
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-07-05 DOI: 10.1016/j.cviu.2025.104440
Xiaogang Song , Junjie Tang , Kaixuan Yang , Weixuan Guo , Xiaofeng Lu , Xinhong Hei
{"title":"A method for absolute pose regression based on cascaded attention modules","authors":"Xiaogang Song ,&nbsp;Junjie Tang ,&nbsp;Kaixuan Yang ,&nbsp;Weixuan Guo ,&nbsp;Xiaofeng Lu ,&nbsp;Xinhong Hei","doi":"10.1016/j.cviu.2025.104440","DOIUrl":"10.1016/j.cviu.2025.104440","url":null,"abstract":"<div><div>The absolute camera pose regression estimates the position and orientation of the camera solely based on captured RGB images. However, current single-image techniques often lack robustness, resulting in significant outliers. To address the issues of pose regressors in repetitive textures and dynamic blur scenarios, this paper proposes an absolute pose regression method based on cascaded attention modules. This network integrates global and local information through cascaded attention modules and then employs a dual-stream attention module to reduce the impact of dynamic objects and lighting changes on localization performance by constructing dual-channel dependencies. Specifically, the cascaded attention modules guide the model to focus on the relationships between global and local features and establish long-range channel dependencies, enabling the network to learn richer multi-scale feature representations. Additionally, a dual-stream attention module is introduced to further enhance feature representation by closely associating spatial and channel dimensions. This method is evaluated and analyzed on various indoor and outdoor datasets, with our method reducing the median position error and orientation error to 0.19 m/<span><math><mrow><mn>7</mn><mo>.</mo><mn>44</mn><mo>°</mo></mrow></math></span> on 7-Scenes and 7.09 m/<span><math><mrow><mn>1</mn><mo>.</mo><mn>45</mn><mo>°</mo></mrow></math></span> on RobotCar, demonstrating that the proposed method can significantly improve localization performance. Ablation studies on multiple categories further verify the effectiveness of the proposed modules.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104440"},"PeriodicalIF":4.3,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144595956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaptDiff: Adaptive diffusion learning for low-light image enhancement AdaptDiff:用于弱光图像增强的自适应扩散学习
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-07-04 DOI: 10.1016/j.cviu.2025.104439
Xiaotao Shao , Guipeng Zhang , Yan Shen , Boyu Zhang , Zhongli Wang , Yanlong Sun
{"title":"AdaptDiff: Adaptive diffusion learning for low-light image enhancement","authors":"Xiaotao Shao ,&nbsp;Guipeng Zhang ,&nbsp;Yan Shen ,&nbsp;Boyu Zhang ,&nbsp;Zhongli Wang ,&nbsp;Yanlong Sun","doi":"10.1016/j.cviu.2025.104439","DOIUrl":"10.1016/j.cviu.2025.104439","url":null,"abstract":"<div><div>Recovering details obscured by noise from low-light images is a challenging task. Recent diffusion models have achieved relatively promising results in low-level vision tasks. However, there are still two issues: (1) under non-uniform illumination conditions, the low-light image cannot be restored with high quality, and (2) the models have limited generalization capabilities. To solve these problems, this paper proposes an Adaptive Enhancement Algorithm guided by a Multi-scale Structural Diffusion (AdaptDiff). AdaptDiff employs adaptive high-order mapping curves (AHMC) for pixel-by-pixel mapping of the image during the diffusion process, thereby adjusting the brightness levels between different regions within the image. In addition, a multi-scale structural guidance approach (MSGD) is proposed as an implicit bias, informing the intermediate layers of the model about the structural characteristics of the image, facilitating more effective restoration of clear images. Guiding the diffusion direction through structural information is conducive to maintaining good performance of the model even when faced with data that it has not previously encountered. Extensive experiments on popular benchmarks show that AdaptDiff achieves superior performance and efficiency.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104439"},"PeriodicalIF":4.3,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144595955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distribution-aware contrastive learning for domain adaptation in 3D LiDAR segmentation 三维激光雷达分割中区域自适应的分布感知对比学习
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-07-03 DOI: 10.1016/j.cviu.2025.104438
Lamiae El Mendili, Sylvie Daniel, Thierry Badard
{"title":"Distribution-aware contrastive learning for domain adaptation in 3D LiDAR segmentation","authors":"Lamiae El Mendili,&nbsp;Sylvie Daniel,&nbsp;Thierry Badard","doi":"10.1016/j.cviu.2025.104438","DOIUrl":"10.1016/j.cviu.2025.104438","url":null,"abstract":"<div><div>Semantic segmentation of 3D LiDAR point clouds is very important for applications like autonomous driving and digital twins of cities. However, current deep learning models suffer from a significant generalization gap. Unsupervised Domain Adaptation methods have recently emerged to tackle this issue. While domain-invariant feature learning using Maximum Mean Discrepancy has shown promise for images due to its simplicity, its application remains unexplored in outdoor mobile mapping point clouds. Moreover, previous methods do not consider the class information, which can lead to suboptimal adaptation performance. We propose a new approach—Contrastive Maximum Mean Discrepancy—to maximize intra-class domain alignment and minimize inter-class domain discrepancy, and integrate it into a 3D semantic segmentation model for LiDAR point clouds. The evaluation of our method with large-scale UDA datasets shows that it surpasses state-of-the-art UDA approaches for 3D LiDAR point clouds. CMMD is a promising UDA approach with strong potential for point cloud semantic segmentation.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104438"},"PeriodicalIF":4.3,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144549098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous hand gesture recognition: Benchmarks and methods 连续手势识别:基准和方法
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-07-02 DOI: 10.1016/j.cviu.2025.104435
Marco Emporio , Amirpouya Ghasemaghaei , Joseph J. Laviola Jr. , Andrea Giachetti
{"title":"Continuous hand gesture recognition: Benchmarks and methods","authors":"Marco Emporio ,&nbsp;Amirpouya Ghasemaghaei ,&nbsp;Joseph J. Laviola Jr. ,&nbsp;Andrea Giachetti","doi":"10.1016/j.cviu.2025.104435","DOIUrl":"10.1016/j.cviu.2025.104435","url":null,"abstract":"<div><div>In this paper, we review the existing benchmarks for continuous gesture recognition, e.g., the online analysis of hand movements over time to detect and recognize meaningful gestures from a specific dictionary. Focusing on human–computer interaction scenarios, we classify these benchmarks based on input data types, gesture dictionaries, and evaluation metrics. Specific metrics for the continuous recognition task are crucial for understanding how effectively gestures are spotted in real time within input streams. We also discuss the most effective detection and classification methods proposed for these benchmarks. Our findings indicate that the number and quality of publicly available datasets remain limited, and evaluation methodologies for continuous recognition are not yet standardized. These issues highlight the need for new benchmarks that reflect real-world usage conditions and can support the development of best practices in gesture-based interface design.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104435"},"PeriodicalIF":4.3,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144569865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking the sparse mask learning mechanism in sparse convolution for object detection on drone images 无人机图像稀疏卷积中稀疏掩模学习机制的再思考
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-07-01 DOI: 10.1016/j.cviu.2025.104432
Yixuan Li , Pengnian Wu , Meng Zhang
{"title":"Rethinking the sparse mask learning mechanism in sparse convolution for object detection on drone images","authors":"Yixuan Li ,&nbsp;Pengnian Wu ,&nbsp;Meng Zhang","doi":"10.1016/j.cviu.2025.104432","DOIUrl":"10.1016/j.cviu.2025.104432","url":null,"abstract":"<div><div>Although sparse convolutional neural networks have achieved significant progress in fast object detection on high-resolution drone images, the research community has yet to pay enough attention to the great potential of prior knowledge (i.e., local contextual information) in UAV imagery for assisting sparse masks to improve detector performance. Such prior knowledge is beneficial for object detection in complex drone imagery, as tiny objects may be mistakenly detected or even missed entirely without referencing the local context surrounding them. In this paper, we take these priors into account and propose a crucial region learning strategy for sparse masks to boost object detection performance. Specifically, we extend the mask region from the feature region of the objects to their surrounding local context region and introduce a method for selecting and evaluating this local context region. Furthermore, we propose a novel mask-matching constraint to replace the mask activation ratio constraint, thereby enhancing object localization accuracy. We extensively evaluate our method across various detectors on two UAV benchmarks: VisDrone and UAVDT. By leveraging our mask learning strategy, the state-of-the-art sparse convolutional framework achieves higher detection gains with a faster detection speed, demonstrating its significant superiority.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104432"},"PeriodicalIF":4.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144549097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信