Rethinking the sparse mask learning mechanism in sparse convolution for object detection on drone images

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2025-07-01 DOI:10.1016/j.cviu.2025.104432

Yixuan Li , Pengnian Wu , Meng Zhang

{"title":"Rethinking the sparse mask learning mechanism in sparse convolution for object detection on drone images","authors":"Yixuan Li , Pengnian Wu , Meng Zhang","doi":"10.1016/j.cviu.2025.104432","DOIUrl":null,"url":null,"abstract":"<div><div>Although sparse convolutional neural networks have achieved significant progress in fast object detection on high-resolution drone images, the research community has yet to pay enough attention to the great potential of prior knowledge (i.e., local contextual information) in UAV imagery for assisting sparse masks to improve detector performance. Such prior knowledge is beneficial for object detection in complex drone imagery, as tiny objects may be mistakenly detected or even missed entirely without referencing the local context surrounding them. In this paper, we take these priors into account and propose a crucial region learning strategy for sparse masks to boost object detection performance. Specifically, we extend the mask region from the feature region of the objects to their surrounding local context region and introduce a method for selecting and evaluating this local context region. Furthermore, we propose a novel mask-matching constraint to replace the mask activation ratio constraint, thereby enhancing object localization accuracy. We extensively evaluate our method across various detectors on two UAV benchmarks: VisDrone and UAVDT. By leveraging our mask learning strategy, the state-of-the-art sparse convolutional framework achieves higher detection gains with a faster detection speed, demonstrating its significant superiority.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104432"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001559","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Although sparse convolutional neural networks have achieved significant progress in fast object detection on high-resolution drone images, the research community has yet to pay enough attention to the great potential of prior knowledge (i.e., local contextual information) in UAV imagery for assisting sparse masks to improve detector performance. Such prior knowledge is beneficial for object detection in complex drone imagery, as tiny objects may be mistakenly detected or even missed entirely without referencing the local context surrounding them. In this paper, we take these priors into account and propose a crucial region learning strategy for sparse masks to boost object detection performance. Specifically, we extend the mask region from the feature region of the objects to their surrounding local context region and introduce a method for selecting and evaluating this local context region. Furthermore, we propose a novel mask-matching constraint to replace the mask activation ratio constraint, thereby enhancing object localization accuracy. We extensively evaluate our method across various detectors on two UAV benchmarks: VisDrone and UAVDT. By leveraging our mask learning strategy, the state-of-the-art sparse convolutional framework achieves higher detection gains with a faster detection speed, demonstrating its significant superiority.

查看原文本刊更多论文

无人机图像稀疏卷积中稀疏掩模学习机制的再思考

尽管稀疏卷积神经网络在高分辨率无人机图像的快速目标检测方面取得了重大进展，但研究界尚未足够重视无人机图像中先验知识（即局部上下文信息）在辅助稀疏掩模提高检测器性能方面的巨大潜力。这种先验知识对复杂无人机图像中的目标检测是有益的，因为微小的物体可能会被错误地检测到，甚至完全错过，而不参考周围的局部环境。在本文中，我们考虑到这些先验，并提出了一种关键的区域学习策略来提高稀疏掩码的目标检测性能。具体而言，我们将遮罩区域从目标的特征区域扩展到其周围的局部上下文区域，并引入了一种选择和评估该局部上下文区域的方法。此外，我们提出了一种新的掩模匹配约束来取代掩模激活率约束，从而提高了目标定位的精度。我们在两个无人机基准上广泛评估了我们在各种探测器上的方法：VisDrone和UAVDT。通过利用我们的掩模学习策略，最先进的稀疏卷积框架以更快的检测速度获得更高的检测增益，显示出其显著的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems