Low-Rank Multimodal Remote Sensing Object Detection With Frequency Filtering Experts

IF 7.5 1区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2024-08-21 DOI:10.1109/TGRS.2024.3446814

Xu Sun;Yinhui Yu;Qing Cheng

{"title":"Low-Rank Multimodal Remote Sensing Object Detection With Frequency Filtering Experts","authors":"Xu Sun;Yinhui Yu;Qing Cheng","doi":"10.1109/TGRS.2024.3446814","DOIUrl":null,"url":null,"abstract":"Visible-infrared object detection for remote sensing images plays an important role in the unmanned aerial vehicle (UAV) around-the-clock application. Most of the existing work focuses on designing complex network architectures to fuse complementary features, while few methods consider computational complexity and susceptibility against modality attacks, limiting the deployment of state-of-the-art frameworks. In this article, we present a low-rank multimodal object detection approach with frequency filtering experts, called LF-MDet, which is based on the advanced DINO (detection transformer with improved denoising anchor boxes) framework. This approach achieves more accurate detection with fewer computational resources. In particular, when the specific modality is attacked or missing, our method still maintains higher robustness toward such pervasive perturbations. Specifically, we propose a low-rank enhancement technology (LET) and a dynamic illumination-aware mask (DIM) module to enable a single backbone network in the form of batch formulation to unbiasedly and compatibly extract multimodal features. Furthermore, we design a lightweight frequency expert encoder (FEE) from the frequency domain perspective to efficiently fuse complementary features by filtering out amplitude noise components and mixing feature tokens. Extensive experiments are conducted on the multimodal remote sensing object detection datasets, VEDAI and DroneVehicle. The results demonstrate the superiority of the proposed approach over advanced multimodal remote sensing object detectors. Compared to the baseline method, our low-rank multimodal detector (LF-MDet) effectively reduces the floating point of operations (FLOPs) by approximately 65% while improving detection accuracy. The code is available at \n<uri>https://github.com/cq100/LF-MDet</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"62 ","pages":"1-14"},"PeriodicalIF":7.5000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10643097/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Visible-infrared object detection for remote sensing images plays an important role in the unmanned aerial vehicle (UAV) around-the-clock application. Most of the existing work focuses on designing complex network architectures to fuse complementary features, while few methods consider computational complexity and susceptibility against modality attacks, limiting the deployment of state-of-the-art frameworks. In this article, we present a low-rank multimodal object detection approach with frequency filtering experts, called LF-MDet, which is based on the advanced DINO (detection transformer with improved denoising anchor boxes) framework. This approach achieves more accurate detection with fewer computational resources. In particular, when the specific modality is attacked or missing, our method still maintains higher robustness toward such pervasive perturbations. Specifically, we propose a low-rank enhancement technology (LET) and a dynamic illumination-aware mask (DIM) module to enable a single backbone network in the form of batch formulation to unbiasedly and compatibly extract multimodal features. Furthermore, we design a lightweight frequency expert encoder (FEE) from the frequency domain perspective to efficiently fuse complementary features by filtering out amplitude noise components and mixing feature tokens. Extensive experiments are conducted on the multimodal remote sensing object detection datasets, VEDAI and DroneVehicle. The results demonstrate the superiority of the proposed approach over advanced multimodal remote sensing object detectors. Compared to the baseline method, our low-rank multimodal detector (LF-MDet) effectively reduces the floating point of operations (FLOPs) by approximately 65% while improving detection accuracy. The code is available at https://github.com/cq100/LF-MDet .

查看原文本刊更多论文

利用频率滤波专家进行低秩多模态遥感物体检测

遥感图像的可见光-红外物体检测在无人驾驶飞行器（UAV）的全天候应用中发挥着重要作用。现有的大部分工作都侧重于设计复杂的网络架构来融合互补特征，而很少有方法考虑到计算复杂性和对模态攻击的敏感性，从而限制了最先进框架的部署。在本文中，我们提出了一种具有频率滤波专家的低秩多模态物体检测方法，称为 LF-MDet，它基于先进的 DINO（具有改进去噪锚框的检测变换器）框架。这种方法能以更少的计算资源实现更精确的检测。特别是，当特定模态受到攻击或丢失时，我们的方法对这种普遍扰动仍能保持较高的鲁棒性。具体来说，我们提出了一种低秩增强技术（LET）和一种动态光照感知掩码（DIM）模块，以批处理的形式使单一骨干网络能够无偏且兼容地提取多模态特征。此外，我们还从频域角度设计了一种轻量级频率专家编码器（FEE），通过滤除振幅噪声成分和混合特征标记来有效融合互补特征。我们在多模态遥感物体检测数据集 VEDAI 和 DroneVehicle 上进行了广泛的实验。实验结果表明，与先进的多模态遥感物体检测器相比，所提出的方法更具优势。与基线方法相比，我们的低秩多模态检测器（LF-MDet）有效减少了约 65% 的浮点运算（FLOPs），同时提高了检测精度。代码见 https://github.com/cq100/LF-MDet。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理

CiteScore

11.50

自引率

28.00%

发文量

1912

审稿时长

4.0 months

期刊介绍： IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.