{"title":"Rethinking the sparse mask learning mechanism in sparse convolution for object detection on drone images","authors":"Yixuan Li , Pengnian Wu , Meng Zhang","doi":"10.1016/j.cviu.2025.104432","DOIUrl":null,"url":null,"abstract":"<div><div>Although sparse convolutional neural networks have achieved significant progress in fast object detection on high-resolution drone images, the research community has yet to pay enough attention to the great potential of prior knowledge (i.e., local contextual information) in UAV imagery for assisting sparse masks to improve detector performance. Such prior knowledge is beneficial for object detection in complex drone imagery, as tiny objects may be mistakenly detected or even missed entirely without referencing the local context surrounding them. In this paper, we take these priors into account and propose a crucial region learning strategy for sparse masks to boost object detection performance. Specifically, we extend the mask region from the feature region of the objects to their surrounding local context region and introduce a method for selecting and evaluating this local context region. Furthermore, we propose a novel mask-matching constraint to replace the mask activation ratio constraint, thereby enhancing object localization accuracy. We extensively evaluate our method across various detectors on two UAV benchmarks: VisDrone and UAVDT. By leveraging our mask learning strategy, the state-of-the-art sparse convolutional framework achieves higher detection gains with a faster detection speed, demonstrating its significant superiority.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104432"},"PeriodicalIF":4.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001559","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Although sparse convolutional neural networks have achieved significant progress in fast object detection on high-resolution drone images, the research community has yet to pay enough attention to the great potential of prior knowledge (i.e., local contextual information) in UAV imagery for assisting sparse masks to improve detector performance. Such prior knowledge is beneficial for object detection in complex drone imagery, as tiny objects may be mistakenly detected or even missed entirely without referencing the local context surrounding them. In this paper, we take these priors into account and propose a crucial region learning strategy for sparse masks to boost object detection performance. Specifically, we extend the mask region from the feature region of the objects to their surrounding local context region and introduce a method for selecting and evaluating this local context region. Furthermore, we propose a novel mask-matching constraint to replace the mask activation ratio constraint, thereby enhancing object localization accuracy. We extensively evaluate our method across various detectors on two UAV benchmarks: VisDrone and UAVDT. By leveraging our mask learning strategy, the state-of-the-art sparse convolutional framework achieves higher detection gains with a faster detection speed, demonstrating its significant superiority.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems