{"title":"CCANet:用于无人机目标检测的跨尺度上下文聚合网络","authors":"Lei Shang , Qihan He , Huan Lei , Wenyuan Yang","doi":"10.1016/j.cviu.2025.104472","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid advancement of deep learning technology, Unmanned Aerial Vehicle (UAV) object detection demonstrates significant potential across various fields. However, multi-scale object variations and complex environmental interference in UAV images present considerable challenges. This paper proposes a new UAV object detection network named Cross-scale Context Aggregation Network (CCANet), which contains Multi-scale Convolution Aggregation Darknet (MCADarknet) and Cross-scale Context Aggregation Feature Pyramid Network (CCA-FPN). First, MCADarknet serves as a multi-scale feature extraction network. It employs parallel multi-scale convolutional kernels and depth-wise strip convolution techniques to expand the network’s receptive field, extracting feature maps at four different scales layer by layer. Second, to address interference in complex scenes, a Context Enhanced Fusion method enhances the interaction between adjacent features extracted by MCADarknet and higher-level features to form intermediate features. Finally, CCA-FPN employs a cross-scale fusion strategy to deeply integrate shallow, intermediate, and deep feature information, thereby enhancing object representation in complex scenarios. Experimental results indicate that CCANet performs well on three public datasets. In particular, <span><math><msub><mrow><mo>mAP</mo></mrow><mrow><mn>50</mn></mrow></msub></math></span> and <span><math><msub><mrow><mo>mAP</mo></mrow><mrow><mn>50</mn><mo>−</mo><mn>95</mn></mrow></msub></math></span> can reach 47.4% and 29.4% respectively on the VisDrone dataset. Compared to the baseline model, it achieves improvements of 6.2% and 4.3%.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104472"},"PeriodicalIF":3.5000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CCANet: A Cross-scale Context Aggregation Network for UAV object detection\",\"authors\":\"Lei Shang , Qihan He , Huan Lei , Wenyuan Yang\",\"doi\":\"10.1016/j.cviu.2025.104472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the rapid advancement of deep learning technology, Unmanned Aerial Vehicle (UAV) object detection demonstrates significant potential across various fields. However, multi-scale object variations and complex environmental interference in UAV images present considerable challenges. This paper proposes a new UAV object detection network named Cross-scale Context Aggregation Network (CCANet), which contains Multi-scale Convolution Aggregation Darknet (MCADarknet) and Cross-scale Context Aggregation Feature Pyramid Network (CCA-FPN). First, MCADarknet serves as a multi-scale feature extraction network. It employs parallel multi-scale convolutional kernels and depth-wise strip convolution techniques to expand the network’s receptive field, extracting feature maps at four different scales layer by layer. Second, to address interference in complex scenes, a Context Enhanced Fusion method enhances the interaction between adjacent features extracted by MCADarknet and higher-level features to form intermediate features. Finally, CCA-FPN employs a cross-scale fusion strategy to deeply integrate shallow, intermediate, and deep feature information, thereby enhancing object representation in complex scenarios. Experimental results indicate that CCANet performs well on three public datasets. In particular, <span><math><msub><mrow><mo>mAP</mo></mrow><mrow><mn>50</mn></mrow></msub></math></span> and <span><math><msub><mrow><mo>mAP</mo></mrow><mrow><mn>50</mn><mo>−</mo><mn>95</mn></mrow></msub></math></span> can reach 47.4% and 29.4% respectively on the VisDrone dataset. Compared to the baseline model, it achieves improvements of 6.2% and 4.3%.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"260 \",\"pages\":\"Article 104472\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S107731422500195X\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S107731422500195X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
CCANet: A Cross-scale Context Aggregation Network for UAV object detection
With the rapid advancement of deep learning technology, Unmanned Aerial Vehicle (UAV) object detection demonstrates significant potential across various fields. However, multi-scale object variations and complex environmental interference in UAV images present considerable challenges. This paper proposes a new UAV object detection network named Cross-scale Context Aggregation Network (CCANet), which contains Multi-scale Convolution Aggregation Darknet (MCADarknet) and Cross-scale Context Aggregation Feature Pyramid Network (CCA-FPN). First, MCADarknet serves as a multi-scale feature extraction network. It employs parallel multi-scale convolutional kernels and depth-wise strip convolution techniques to expand the network’s receptive field, extracting feature maps at four different scales layer by layer. Second, to address interference in complex scenes, a Context Enhanced Fusion method enhances the interaction between adjacent features extracted by MCADarknet and higher-level features to form intermediate features. Finally, CCA-FPN employs a cross-scale fusion strategy to deeply integrate shallow, intermediate, and deep feature information, thereby enhancing object representation in complex scenarios. Experimental results indicate that CCANet performs well on three public datasets. In particular, and can reach 47.4% and 29.4% respectively on the VisDrone dataset. Compared to the baseline model, it achieves improvements of 6.2% and 4.3%.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems