Zecong Ye , Hexiang Hao , Yueping Peng , Wei Tang , Xuekai Zhang , Baixuan Han , Haolong Zhai
{"title":"MBUDet:通过目标偏移量标签生成的非对准双峰无人机目标检测","authors":"Zecong Ye , Hexiang Hao , Yueping Peng , Wei Tang , Xuekai Zhang , Baixuan Han , Haolong Zhai","doi":"10.1016/j.inffus.2025.103756","DOIUrl":null,"url":null,"abstract":"<div><div>The widespread use of unmanned aerial vehicles (UAVs) has increased the demand for airborne target detection technologies in security and surveillance. The use of only infrared or visible detection technology is often limited by environmental factors and target characteristics. Consequently, the utilization of RGB-Infrared fusion techniques in detection has emerged as a key area of research. However, the alignment operation of multimodal images is quite time-consuming in practical UAV target detection missions. To address this challenge, we propose Misaligned Bimodal UAV Target Detection (MBUDet), which ingeniously integrates the two stages of target alignment and RGB-Infrared object detection into a process, thereby enhancing the detection speed. It primarily comprises four modules: size alignment, target alignment, modal weight calculation, and modal feature fusion. The size alignment module unifies the visible and infrared image sizes; The target alignment module uses existing bimodal target labels to generate target offset labels, which supervise the network to learn target feature alignment, and this module overcomes the effect of mosaic augmentation; the modal weight calculation module mainly solves the problem of a single modality appearing as a target resulting in the network not being able to learn it effectively; the modal feature fusion module focuses on enhancing the feature representations utilizing a spatial attention module. Experiments on our proposed Misaligned Bimodal UAV target dataset (MBU), MBUDet outperforms baseline by 4.8 % and 4.1 % in F1, and AP50 respectively. Also, the experimental results show that the method performs better than existing algorithms. The code associated with this study will be made publicly available soon at the following GitHub repository: <span><span>http://github.com/Yipzcc/MBUDet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103756"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MBUDet: Misaligned bimodal UAV target detection via target offset label generation\",\"authors\":\"Zecong Ye , Hexiang Hao , Yueping Peng , Wei Tang , Xuekai Zhang , Baixuan Han , Haolong Zhai\",\"doi\":\"10.1016/j.inffus.2025.103756\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The widespread use of unmanned aerial vehicles (UAVs) has increased the demand for airborne target detection technologies in security and surveillance. The use of only infrared or visible detection technology is often limited by environmental factors and target characteristics. Consequently, the utilization of RGB-Infrared fusion techniques in detection has emerged as a key area of research. However, the alignment operation of multimodal images is quite time-consuming in practical UAV target detection missions. To address this challenge, we propose Misaligned Bimodal UAV Target Detection (MBUDet), which ingeniously integrates the two stages of target alignment and RGB-Infrared object detection into a process, thereby enhancing the detection speed. It primarily comprises four modules: size alignment, target alignment, modal weight calculation, and modal feature fusion. The size alignment module unifies the visible and infrared image sizes; The target alignment module uses existing bimodal target labels to generate target offset labels, which supervise the network to learn target feature alignment, and this module overcomes the effect of mosaic augmentation; the modal weight calculation module mainly solves the problem of a single modality appearing as a target resulting in the network not being able to learn it effectively; the modal feature fusion module focuses on enhancing the feature representations utilizing a spatial attention module. Experiments on our proposed Misaligned Bimodal UAV target dataset (MBU), MBUDet outperforms baseline by 4.8 % and 4.1 % in F1, and AP50 respectively. Also, the experimental results show that the method performs better than existing algorithms. The code associated with this study will be made publicly available soon at the following GitHub repository: <span><span>http://github.com/Yipzcc/MBUDet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"127 \",\"pages\":\"Article 103756\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525008188\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008188","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
The widespread use of unmanned aerial vehicles (UAVs) has increased the demand for airborne target detection technologies in security and surveillance. The use of only infrared or visible detection technology is often limited by environmental factors and target characteristics. Consequently, the utilization of RGB-Infrared fusion techniques in detection has emerged as a key area of research. However, the alignment operation of multimodal images is quite time-consuming in practical UAV target detection missions. To address this challenge, we propose Misaligned Bimodal UAV Target Detection (MBUDet), which ingeniously integrates the two stages of target alignment and RGB-Infrared object detection into a process, thereby enhancing the detection speed. It primarily comprises four modules: size alignment, target alignment, modal weight calculation, and modal feature fusion. The size alignment module unifies the visible and infrared image sizes; The target alignment module uses existing bimodal target labels to generate target offset labels, which supervise the network to learn target feature alignment, and this module overcomes the effect of mosaic augmentation; the modal weight calculation module mainly solves the problem of a single modality appearing as a target resulting in the network not being able to learn it effectively; the modal feature fusion module focuses on enhancing the feature representations utilizing a spatial attention module. Experiments on our proposed Misaligned Bimodal UAV target dataset (MBU), MBUDet outperforms baseline by 4.8 % and 4.1 % in F1, and AP50 respectively. Also, the experimental results show that the method performs better than existing algorithms. The code associated with this study will be made publicly available soon at the following GitHub repository: http://github.com/Yipzcc/MBUDet.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.