Unified diffusion-based object detection in multi-modal and low-light remote sensing images

IF 0.7 4区工程技术 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Electronics Letters Pub Date : 2024-11-19 DOI:10.1049/ell2.70093

Xu Sun, Yinhui Yu, Qing Cheng

{"title":"Unified diffusion-based object detection in multi-modal and low-light remote sensing images","authors":"Xu Sun, Yinhui Yu, Qing Cheng","doi":"10.1049/ell2.70093","DOIUrl":null,"url":null,"abstract":"Remote sensing object detection remains a challenge under complex conditions such as low light, adverse weather, modality attacks or losses. Previous approaches typically alleviate this problem by enhancing visible images or leveraging multi-modal fusion technologies. In view of this, the authors propose a unified framework based on YOLO-World that combines the advantages of both schemes, achieving more adaptable and robust remote sensing object detection in complex real-world scenarios. This framework introduces a unified modality modelling strategy, allowing the model to learn abundant object features from multiple remote sensing datasets. Additionally, a U-fusion neck based on the diffusion method is designed to effectively remove modality-specific noise and generate missing complementary features. Extensive experiments were conducted on four remote sensing image datasets: Multimodal VEDAI, DroneVehicle, unimodal VisDrone and UAVDT. This approach achieves average precision scores of 50.5<math>\n <semantics>\n <mo>%</mo>\n <annotation>$\\%$</annotation>\n </semantics></math>, 55.3<math>\n <semantics>\n <mo>%</mo>\n <annotation>$\\%$</annotation>\n </semantics></math>, 25.1<math>\n <semantics>\n <mo>%</mo>\n <annotation>$\\%$</annotation>\n </semantics></math>, and 20.7<math>\n <semantics>\n <mo>%</mo>\n <annotation>$\\%$</annotation>\n </semantics></math>, which outperforms advanced multimodal remote sensing object detection methods and low-light image enhancement techniques.","PeriodicalId":11556,"journal":{"name":"Electronics Letters","volume":"60 22","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ell2.70093","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronics Letters","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ell2.70093","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Remote sensing object detection remains a challenge under complex conditions such as low light, adverse weather, modality attacks or losses. Previous approaches typically alleviate this problem by enhancing visible images or leveraging multi-modal fusion technologies. In view of this, the authors propose a unified framework based on YOLO-World that combines the advantages of both schemes, achieving more adaptable and robust remote sensing object detection in complex real-world scenarios. This framework introduces a unified modality modelling strategy, allowing the model to learn abundant object features from multiple remote sensing datasets. Additionally, a U-fusion neck based on the diffusion method is designed to effectively remove modality-specific noise and generate missing complementary features. Extensive experiments were conducted on four remote sensing image datasets: Multimodal VEDAI, DroneVehicle, unimodal VisDrone and UAVDT. This approach achieves average precision scores of 50.5 $%$ , 55.3 $%$ , 25.1 $%$ , and 20.7 $%$ , which outperforms advanced multimodal remote sensing object detection methods and low-light image enhancement techniques.

Abstract Image

查看原文本刊更多论文

在多模态和低照度遥感图像中进行基于扩散的统一物体检测

在光线不足、恶劣天气、模态攻击或丢失等复杂条件下，遥感物体检测仍然是一项挑战。以往的方法通常是通过增强可见光图像或利用多模态融合技术来缓解这一问题。有鉴于此，作者提出了一个基于 YOLO-World 的统一框架，该框架结合了两种方案的优势，在复杂的现实世界场景中实现了更具适应性和鲁棒性的遥感目标检测。该框架引入了统一的模态建模策略，允许模型从多个遥感数据集中学习丰富的物体特征。此外，还设计了基于扩散方法的 U 型融合颈，以有效去除特定模态噪声并生成缺失的互补特征。在四个遥感图像数据集上进行了广泛的实验：多模态 VEDAI、DroneVehicle、单模态 VisDrone 和 UAVDT。该方法的平均精确度分别为50.5% （$/%$）、55.3% （$/%$）、25.1% （$/%$）和20.7% （$/%$），优于先进的多模态遥感物体检测方法和弱光图像增强技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Electronics Letters 工程技术-工程：电子与电气

CiteScore

2.70

自引率

0.00%

发文量

268

审稿时长

3.6 months

期刊介绍： Electronics Letters is an internationally renowned peer-reviewed rapid-communication journal that publishes short original research papers every two weeks. Its broad and interdisciplinary scope covers the latest developments in all electronic engineering related fields including communication, biomedical, optical and device technologies. Electronics Letters also provides further insight into some of the latest developments through special features and interviews. Scope As a journal at the forefront of its field, Electronics Letters publishes papers covering all themes of electronic and electrical engineering. The major themes of the journal are listed below. Antennas and Propagation Biomedical and Bioinspired Technologies, Signal Processing and Applications Control Engineering Electromagnetism: Theory, Materials and Devices Electronic Circuits and Systems Image, Video and Vision Processing and Applications Information, Computing and Communications Instrumentation and Measurement Microwave Technology Optical Communications Photonics and Opto-Electronics Power Electronics, Energy and Sustainability Radar, Sonar and Navigation Semiconductor Technology Signal Processing MIMO