Fengxi Sun , Ning He , Runjie Li , Hongfei Liu , Yuxiang Zou
{"title":"DetailCaptureYOLO: Accurately Detecting Small Targets in UAV Aerial Images","authors":"Fengxi Sun , Ning He , Runjie Li , Hongfei Liu , Yuxiang Zou","doi":"10.1016/j.jvcir.2024.104349","DOIUrl":null,"url":null,"abstract":"<div><div>Unmanned aerial vehicle aerial imagery is dominated by small objects, obtaining feature maps with more detailed information is crucial for target detection. Therefore, this paper presents an improved algorithm based on YOLOv9, named DetailCaptureYOLO, which has a strong ability to capture detailed features. First, a dynamic fusion path aggregation network is proposed to dynamically fuse multi-level and multi-scale feature maps, effectively ensuring information integrity and richer features. Additionally, more flexible dynamic upsampling and wavelet transform-based downsampling operators are used to optimize the sampling operations. Finally, the Inner-IoU is used in Powerful-IoU, effectively enhancing the network’s ability to detect small targets. The neck improvement proposed in this paper can be transferred to mainstream object detection algorithms. When applied to YOLOv9, AP50, mAP and AP-small were improved by 8.5%, 5.5% and 7.2%, on the VisDrone dataset. When applied to other algorithms, the improvements in AP50 were 5.1%–6.5%. Experimental results demonstrate that the proposed method excels in detecting small targets and exhibits strong transferability. The codes are at: <span><span>https://github.com/SFXSunFengXi/DetailCaptureYOLO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"106 ","pages":"Article 104349"},"PeriodicalIF":2.6000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320324003055","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Unmanned aerial vehicle aerial imagery is dominated by small objects, obtaining feature maps with more detailed information is crucial for target detection. Therefore, this paper presents an improved algorithm based on YOLOv9, named DetailCaptureYOLO, which has a strong ability to capture detailed features. First, a dynamic fusion path aggregation network is proposed to dynamically fuse multi-level and multi-scale feature maps, effectively ensuring information integrity and richer features. Additionally, more flexible dynamic upsampling and wavelet transform-based downsampling operators are used to optimize the sampling operations. Finally, the Inner-IoU is used in Powerful-IoU, effectively enhancing the network’s ability to detect small targets. The neck improvement proposed in this paper can be transferred to mainstream object detection algorithms. When applied to YOLOv9, AP50, mAP and AP-small were improved by 8.5%, 5.5% and 7.2%, on the VisDrone dataset. When applied to other algorithms, the improvements in AP50 were 5.1%–6.5%. Experimental results demonstrate that the proposed method excels in detecting small targets and exhibits strong transferability. The codes are at: https://github.com/SFXSunFengXi/DetailCaptureYOLO.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.