Sheng Lu , Yangming Guo , Jiang Long , Zun Liu , Zhuqing Wang , Ying Li
{"title":"无人机航拍图像密集小目标检测算法","authors":"Sheng Lu , Yangming Guo , Jiang Long , Zun Liu , Zhuqing Wang , Ying Li","doi":"10.1016/j.imavis.2025.105485","DOIUrl":null,"url":null,"abstract":"<div><div>Unmanned aerial vehicle (UAV) aerial images make dense small target detection challenging due to the complex background, small object size in the wide field of view, low resolution, and dense target distribution. Many aerial target detection networks and attention-based methods have been proposed to enhance the capability of dense small target detection, but there are still problems, such as insufficient effective information extraction, missed detection, and false detection of small targets in dense areas. Therefore, this paper proposes a novel dense small target detection algorithm (DSTDA) for UAV aerial images suitable for various high-altitude complex environments. The core component of the proposed DSTDA consists of the multi-axis attention units, the adaptive feature transformation mechanism, and the target-guided sample allocation strategy. Firstly, by introducing the multi-axis attention units into DSTDA, the limitation of DSTDA on global information perception can be addressed. Thus, the detailed features and spatial relationships of small targets at long distances can be sufficiently extracted by our proposed algorithm. Secondly, an adaptive feature transformation mechanism is designed to flexibly adjust the feature map according to the characteristics of the target distribution, which enables the DSTDA to focus more on densely populated target areas. Lastly, a goal-oriented sample allocation strategy is presented, combining coarse screening based on positional information and fine screening guided by target prediction information. By employing this dynamic sample allocation from coarse to fine, the detection performance of small and dense targets in complex backgrounds is further improved. These above innovative improvements empower the DSTDA with enhanced global perception and target-focusing capabilities, effectively addressing the challenges of detecting dense small targets in complex aerial scenes. Experimental validation was conducted on three publicly available datasets: VisDrone, SIMD, and CARPK. The results showed that the proposed DSTDA outperforms other state-of-the-art algorithms in terms of comprehensive performance. The algorithm significantly improves the issues of false alarms and missed detection in drone-based target detection, showcasing remarkable accuracy and real-time performance. It proves to be proficient in the task of detecting dense small targets in drone scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105485"},"PeriodicalIF":4.2000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dense small target detection algorithm for UAV aerial imagery\",\"authors\":\"Sheng Lu , Yangming Guo , Jiang Long , Zun Liu , Zhuqing Wang , Ying Li\",\"doi\":\"10.1016/j.imavis.2025.105485\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Unmanned aerial vehicle (UAV) aerial images make dense small target detection challenging due to the complex background, small object size in the wide field of view, low resolution, and dense target distribution. Many aerial target detection networks and attention-based methods have been proposed to enhance the capability of dense small target detection, but there are still problems, such as insufficient effective information extraction, missed detection, and false detection of small targets in dense areas. Therefore, this paper proposes a novel dense small target detection algorithm (DSTDA) for UAV aerial images suitable for various high-altitude complex environments. The core component of the proposed DSTDA consists of the multi-axis attention units, the adaptive feature transformation mechanism, and the target-guided sample allocation strategy. Firstly, by introducing the multi-axis attention units into DSTDA, the limitation of DSTDA on global information perception can be addressed. Thus, the detailed features and spatial relationships of small targets at long distances can be sufficiently extracted by our proposed algorithm. Secondly, an adaptive feature transformation mechanism is designed to flexibly adjust the feature map according to the characteristics of the target distribution, which enables the DSTDA to focus more on densely populated target areas. Lastly, a goal-oriented sample allocation strategy is presented, combining coarse screening based on positional information and fine screening guided by target prediction information. By employing this dynamic sample allocation from coarse to fine, the detection performance of small and dense targets in complex backgrounds is further improved. These above innovative improvements empower the DSTDA with enhanced global perception and target-focusing capabilities, effectively addressing the challenges of detecting dense small targets in complex aerial scenes. Experimental validation was conducted on three publicly available datasets: VisDrone, SIMD, and CARPK. The results showed that the proposed DSTDA outperforms other state-of-the-art algorithms in terms of comprehensive performance. The algorithm significantly improves the issues of false alarms and missed detection in drone-based target detection, showcasing remarkable accuracy and real-time performance. It proves to be proficient in the task of detecting dense small targets in drone scenarios.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"156 \",\"pages\":\"Article 105485\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625000733\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000733","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Dense small target detection algorithm for UAV aerial imagery
Unmanned aerial vehicle (UAV) aerial images make dense small target detection challenging due to the complex background, small object size in the wide field of view, low resolution, and dense target distribution. Many aerial target detection networks and attention-based methods have been proposed to enhance the capability of dense small target detection, but there are still problems, such as insufficient effective information extraction, missed detection, and false detection of small targets in dense areas. Therefore, this paper proposes a novel dense small target detection algorithm (DSTDA) for UAV aerial images suitable for various high-altitude complex environments. The core component of the proposed DSTDA consists of the multi-axis attention units, the adaptive feature transformation mechanism, and the target-guided sample allocation strategy. Firstly, by introducing the multi-axis attention units into DSTDA, the limitation of DSTDA on global information perception can be addressed. Thus, the detailed features and spatial relationships of small targets at long distances can be sufficiently extracted by our proposed algorithm. Secondly, an adaptive feature transformation mechanism is designed to flexibly adjust the feature map according to the characteristics of the target distribution, which enables the DSTDA to focus more on densely populated target areas. Lastly, a goal-oriented sample allocation strategy is presented, combining coarse screening based on positional information and fine screening guided by target prediction information. By employing this dynamic sample allocation from coarse to fine, the detection performance of small and dense targets in complex backgrounds is further improved. These above innovative improvements empower the DSTDA with enhanced global perception and target-focusing capabilities, effectively addressing the challenges of detecting dense small targets in complex aerial scenes. Experimental validation was conducted on three publicly available datasets: VisDrone, SIMD, and CARPK. The results showed that the proposed DSTDA outperforms other state-of-the-art algorithms in terms of comprehensive performance. The algorithm significantly improves the issues of false alarms and missed detection in drone-based target detection, showcasing remarkable accuracy and real-time performance. It proves to be proficient in the task of detecting dense small targets in drone scenarios.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.