{"title":"DMSDA-YOLO:遥感目标检测的动态多尺度扩展注意","authors":"Zhenghua Huang;Zijian Xu;Xi Li;Yaozong Zhang;Yu Shi;Qian Li;Hao Fang","doi":"10.1109/LGRS.2025.3596809","DOIUrl":null,"url":null,"abstract":"It is an extremely challenging task to detect multiscale targets (especially small objects) in remote sensing (RS) images with complex backgrounds. This letter develops a novel RS object detection model, namely dynamic multiscale dilated attention based on YOLOv5 (DMSDA-YOLO), of which the key improvements include: one is that, in the backbone, a multiscale dilated attention fusion module (MDAFM) is proposed to capture multiscale feature information and a coordinate anchor attention (CAA) mechanism is incorporated to increase the focus on target regions while suppressing background interference. The other is that a spatial attention pyramid neck network is proposed to improve its feature fusion capability while a dynamic attention-aware feature extraction module (DAFEM) is introduced to enhance the network’s adaptability to multiscale targets in the neck. Objective and subjective results of experiments on the DIOR, HRRSD, and NWPU VHR-10 datasets demonstrate that our DMSDA-YOLO outperforms existing state-of-the-art object detection approaches in detecting multiscale targets under complex backgrounds, and its competitive computational complexity is beneficial for its extensive application.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DMSDA-YOLO: Dynamic Multiscale Dilated Attention for Remote Sensing Object Detection\",\"authors\":\"Zhenghua Huang;Zijian Xu;Xi Li;Yaozong Zhang;Yu Shi;Qian Li;Hao Fang\",\"doi\":\"10.1109/LGRS.2025.3596809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is an extremely challenging task to detect multiscale targets (especially small objects) in remote sensing (RS) images with complex backgrounds. This letter develops a novel RS object detection model, namely dynamic multiscale dilated attention based on YOLOv5 (DMSDA-YOLO), of which the key improvements include: one is that, in the backbone, a multiscale dilated attention fusion module (MDAFM) is proposed to capture multiscale feature information and a coordinate anchor attention (CAA) mechanism is incorporated to increase the focus on target regions while suppressing background interference. The other is that a spatial attention pyramid neck network is proposed to improve its feature fusion capability while a dynamic attention-aware feature extraction module (DAFEM) is introduced to enhance the network’s adaptability to multiscale targets in the neck. Objective and subjective results of experiments on the DIOR, HRRSD, and NWPU VHR-10 datasets demonstrate that our DMSDA-YOLO outperforms existing state-of-the-art object detection approaches in detecting multiscale targets under complex backgrounds, and its competitive computational complexity is beneficial for its extensive application.\",\"PeriodicalId\":91017,\"journal\":{\"name\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"volume\":\"22 \",\"pages\":\"1-5\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11119686/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11119686/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DMSDA-YOLO: Dynamic Multiscale Dilated Attention for Remote Sensing Object Detection
It is an extremely challenging task to detect multiscale targets (especially small objects) in remote sensing (RS) images with complex backgrounds. This letter develops a novel RS object detection model, namely dynamic multiscale dilated attention based on YOLOv5 (DMSDA-YOLO), of which the key improvements include: one is that, in the backbone, a multiscale dilated attention fusion module (MDAFM) is proposed to capture multiscale feature information and a coordinate anchor attention (CAA) mechanism is incorporated to increase the focus on target regions while suppressing background interference. The other is that a spatial attention pyramid neck network is proposed to improve its feature fusion capability while a dynamic attention-aware feature extraction module (DAFEM) is introduced to enhance the network’s adaptability to multiscale targets in the neck. Objective and subjective results of experiments on the DIOR, HRRSD, and NWPU VHR-10 datasets demonstrate that our DMSDA-YOLO outperforms existing state-of-the-art object detection approaches in detecting multiscale targets under complex backgrounds, and its competitive computational complexity is beneficial for its extensive application.