{"title":"APNet:用于无人机图像目标检测的精确定位变形卷积","authors":"Peiran Zhang;Guoxin Zhang;Kuihe Yang","doi":"10.1109/TLA.2024.10472961","DOIUrl":null,"url":null,"abstract":"Unmanned aerial vehicle (UAV) image object detection, in recent years, has been receiving increasing attention for its wide application in military and civil fields. Current object detection methods perform well in generic scenarios, while vast small objects and extremely dense distribution in UAV images make it difficult to capture them, resulting in sub-optimal performance. In this paper, we propose a UAV image object detection framework APNet, which addresses the issue mentioned above by fine-grain deformable convolution (DC) and effective feature fusion. First, we design an accurate positioning deformable convolution (APDC), which changes the kernel shape dynamically to enforce refined features, especially in regions where objects gather densely. Specifically, a positional information enhancement attention (PEA) is designed to generate more accurate convolutional position offsets depending on the object position. Therefore, APDC alleviates inflexible deformation in vanilla DC and exhibits better adaptability to the shapes of different objects, which discriminates multi-objects in densely distributed areas in a fine-grain way. Second, we propose an effective cross-layer feature fusion (ECF) to integrate multi-scale features effectively and aggregate attentive features dynamically. Extensive experiments conducted on VisDrone and UAVDT demonstrate the universality and effectiveness of our APNet, achieving 29.8 and 48.7 in mAP and mAP50, respectively. Compared to the state-of-the-art (SOTA) method, our APNet achieves an improvement of 2.2 and 3.5 in mAP and mAP50, respectively.","PeriodicalId":55024,"journal":{"name":"IEEE Latin America Transactions","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10472961","citationCount":"0","resultStr":"{\"title\":\"APNet: Accurate Positioning Deformable Convolution for UAV Image Object Detection\",\"authors\":\"Peiran Zhang;Guoxin Zhang;Kuihe Yang\",\"doi\":\"10.1109/TLA.2024.10472961\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unmanned aerial vehicle (UAV) image object detection, in recent years, has been receiving increasing attention for its wide application in military and civil fields. Current object detection methods perform well in generic scenarios, while vast small objects and extremely dense distribution in UAV images make it difficult to capture them, resulting in sub-optimal performance. In this paper, we propose a UAV image object detection framework APNet, which addresses the issue mentioned above by fine-grain deformable convolution (DC) and effective feature fusion. First, we design an accurate positioning deformable convolution (APDC), which changes the kernel shape dynamically to enforce refined features, especially in regions where objects gather densely. Specifically, a positional information enhancement attention (PEA) is designed to generate more accurate convolutional position offsets depending on the object position. Therefore, APDC alleviates inflexible deformation in vanilla DC and exhibits better adaptability to the shapes of different objects, which discriminates multi-objects in densely distributed areas in a fine-grain way. Second, we propose an effective cross-layer feature fusion (ECF) to integrate multi-scale features effectively and aggregate attentive features dynamically. Extensive experiments conducted on VisDrone and UAVDT demonstrate the universality and effectiveness of our APNet, achieving 29.8 and 48.7 in mAP and mAP50, respectively. Compared to the state-of-the-art (SOTA) method, our APNet achieves an improvement of 2.2 and 3.5 in mAP and mAP50, respectively.\",\"PeriodicalId\":55024,\"journal\":{\"name\":\"IEEE Latin America Transactions\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10472961\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Latin America Transactions\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10472961/\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Latin America Transactions","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10472961/","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
APNet: Accurate Positioning Deformable Convolution for UAV Image Object Detection
Unmanned aerial vehicle (UAV) image object detection, in recent years, has been receiving increasing attention for its wide application in military and civil fields. Current object detection methods perform well in generic scenarios, while vast small objects and extremely dense distribution in UAV images make it difficult to capture them, resulting in sub-optimal performance. In this paper, we propose a UAV image object detection framework APNet, which addresses the issue mentioned above by fine-grain deformable convolution (DC) and effective feature fusion. First, we design an accurate positioning deformable convolution (APDC), which changes the kernel shape dynamically to enforce refined features, especially in regions where objects gather densely. Specifically, a positional information enhancement attention (PEA) is designed to generate more accurate convolutional position offsets depending on the object position. Therefore, APDC alleviates inflexible deformation in vanilla DC and exhibits better adaptability to the shapes of different objects, which discriminates multi-objects in densely distributed areas in a fine-grain way. Second, we propose an effective cross-layer feature fusion (ECF) to integrate multi-scale features effectively and aggregate attentive features dynamically. Extensive experiments conducted on VisDrone and UAVDT demonstrate the universality and effectiveness of our APNet, achieving 29.8 and 48.7 in mAP and mAP50, respectively. Compared to the state-of-the-art (SOTA) method, our APNet achieves an improvement of 2.2 and 3.5 in mAP and mAP50, respectively.
期刊介绍:
IEEE Latin America Transactions (IEEE LATAM) is an interdisciplinary journal focused on the dissemination of original and quality research papers / review articles in Spanish and Portuguese of emerging topics in three main areas: Computing, Electric Energy and Electronics. Some of the sub-areas of the journal are, but not limited to: Automatic control, communications, instrumentation, artificial intelligence, power and industrial electronics, fault diagnosis and detection, transportation electrification, internet of things, electrical machines, circuits and systems, biomedicine and biomedical / haptic applications, secure communications, robotics, sensors and actuators, computer networks, smart grids, among others.