{"title":"RAFNet: Rotation-aware anchor-free framework for geospatial object detection","authors":"Liwei Deng , Yangyang Tan , Songyu Chen","doi":"10.1016/j.cviu.2025.104373","DOIUrl":null,"url":null,"abstract":"<div><div>Object detection in remote sensing images plays a crucial role in applications such as disaster monitoring, and urban planning. However, detecting small and rotated objects in complex backgrounds remains a significant challenge. Traditional anchor-based methods, which rely on preset anchor boxes with fixed sizes and aspect ratios, face three core limitations: geometric mismatch (difficulty adapting to rotated objects and feature confusion caused by dense anchor boxes), missed detection of small objects (feature loss due to the decoupling between anchor boxes and feature map strides), and parameter sensitivity (requiring complex anchor box combinations for multi-scale targets).</div><div>To address these challenges, this paper proposes an anchor-free detection framework, RAFNet, integrating three key innovations: Mona Swin Transformer as the backbone to enhance feature extraction, Rotated Feature Pyramid Network (Rotated FPN) for rotation-aware feature representation, and Local Importance-based Attention (LIA) mechanism to focus on critical regions and improve object feature representation. Extensive experiments on the DOTA1.0 dataset demonstrate that RAFNet achieves a mean Average Precision (mAP) of 74.91, outperforming baseline models by 3.24%, with significant improvements in challenging categories such as helicopters (+32.5% AP) and roundabouts (+4% AP). The model achieves the mAP of 30.29% on the STAR dataset, validating its high adaptability and robustness in generalization tasks. These results highlight the effectiveness of the proposed method in detecting small, rotated objects in complex scenes. RAFNet offers a more flexible, efficient, and generalizable solution for remote sensing object detection, underscoring the great potential of anchor-free approaches in this field.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"257 ","pages":"Article 104373"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225000967","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Object detection in remote sensing images plays a crucial role in applications such as disaster monitoring, and urban planning. However, detecting small and rotated objects in complex backgrounds remains a significant challenge. Traditional anchor-based methods, which rely on preset anchor boxes with fixed sizes and aspect ratios, face three core limitations: geometric mismatch (difficulty adapting to rotated objects and feature confusion caused by dense anchor boxes), missed detection of small objects (feature loss due to the decoupling between anchor boxes and feature map strides), and parameter sensitivity (requiring complex anchor box combinations for multi-scale targets).
To address these challenges, this paper proposes an anchor-free detection framework, RAFNet, integrating three key innovations: Mona Swin Transformer as the backbone to enhance feature extraction, Rotated Feature Pyramid Network (Rotated FPN) for rotation-aware feature representation, and Local Importance-based Attention (LIA) mechanism to focus on critical regions and improve object feature representation. Extensive experiments on the DOTA1.0 dataset demonstrate that RAFNet achieves a mean Average Precision (mAP) of 74.91, outperforming baseline models by 3.24%, with significant improvements in challenging categories such as helicopters (+32.5% AP) and roundabouts (+4% AP). The model achieves the mAP of 30.29% on the STAR dataset, validating its high adaptability and robustness in generalization tasks. These results highlight the effectiveness of the proposed method in detecting small, rotated objects in complex scenes. RAFNet offers a more flexible, efficient, and generalizable solution for remote sensing object detection, underscoring the great potential of anchor-free approaches in this field.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems