基于掩模制导的可见光-红外车辆检测交叉模态融合网络

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-04-21 DOI:10.1109/LSP.2025.3562816

Lingyun Tian;Qiang Shen;Zilong Deng;Yang Gao;Simiao Wang

{"title":"基于掩模制导的可见光-红外车辆检测交叉模态融合网络","authors":"Lingyun Tian;Qiang Shen;Zilong Deng;Yang Gao;Simiao Wang","doi":"10.1109/LSP.2025.3562816","DOIUrl":null,"url":null,"abstract":"Drone-based vehicle detection is crucial for intelligent traffic management. However, current methods relying solely on single visible or infrared modalities struggle with precision and robustness, especially in adverse weather conditions. The effective integration of cross-modal information to enhance vehicle detection still poses significant challenges. In this letter, we propose a masked-guided cross-modality fusion method, called MCMF, for robust and accurate visible-infrared vehicle detection. Firstly, we construct a framework consisting of three branches, with two dedicated to the visible and infrared modalities respectively, and another tailored for the fused multi-modal. Secondly, we introduce a Location-Sensitive Masked AutoEncoder (LMAE) for intermediate-level feature fusion. Specifically, our LMAE utilizes masks to cover intermediate-level features of one modality based on the prediction hierarchy of another modality, and then distills cross-modality guidance information through regularization constraints. This strategy, through a self-learning paradigm, effectively preserves the useful information from both modalities while eliminating redundant information from each. Finally, the fused features are input into an uncertainty-based detection head to generate predictions for bounding boxes of vehicles. When evaluated on the DroneVehicle dataset, our MCIF reaches 71.42% w.r..t. mAP, outperforming an established baseline method by 7.42%. Ablation studies further demonstrate the effectiveness of our LMAE for visible-infrared fusion.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1815-1819"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mask-Guided Cross-Modality Fusion Network for Visible-Infrared Vehicle Detection\",\"authors\":\"Lingyun Tian;Qiang Shen;Zilong Deng;Yang Gao;Simiao Wang\",\"doi\":\"10.1109/LSP.2025.3562816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Drone-based vehicle detection is crucial for intelligent traffic management. However, current methods relying solely on single visible or infrared modalities struggle with precision and robustness, especially in adverse weather conditions. The effective integration of cross-modal information to enhance vehicle detection still poses significant challenges. In this letter, we propose a masked-guided cross-modality fusion method, called MCMF, for robust and accurate visible-infrared vehicle detection. Firstly, we construct a framework consisting of three branches, with two dedicated to the visible and infrared modalities respectively, and another tailored for the fused multi-modal. Secondly, we introduce a Location-Sensitive Masked AutoEncoder (LMAE) for intermediate-level feature fusion. Specifically, our LMAE utilizes masks to cover intermediate-level features of one modality based on the prediction hierarchy of another modality, and then distills cross-modality guidance information through regularization constraints. This strategy, through a self-learning paradigm, effectively preserves the useful information from both modalities while eliminating redundant information from each. Finally, the fused features are input into an uncertainty-based detection head to generate predictions for bounding boxes of vehicles. When evaluated on the DroneVehicle dataset, our MCIF reaches 71.42% w.r..t. mAP, outperforming an established baseline method by 7.42%. Ablation studies further demonstrate the effectiveness of our LMAE for visible-infrared fusion.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"1815-1819\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10971225/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10971225/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

基于无人机的车辆检测对于智能交通管理至关重要。然而，目前仅依靠单一可见光或红外模式的方法在精度和鲁棒性方面存在问题，特别是在恶劣的天气条件下。有效整合跨模态信息以增强车辆检测仍然是一个重大挑战。在这封信中，我们提出了一种掩蔽制导的交叉模态融合方法，称为MCMF，用于鲁棒和精确的可见红外车辆检测。首先，我们构建了一个由三个分支组成的框架，其中两个分支分别用于可见光和红外模态，另一个分支用于融合多模态。其次，引入位置敏感掩码自编码器（LMAE）进行中级特征融合。具体来说，我们的LMAE基于另一模态的预测层次，利用掩码覆盖一模态的中级特征，然后通过正则化约束提取跨模态的引导信息。该策略通过自学习范式，有效地保留了两种模式中的有用信息，同时消除了每种模式中的冗余信息。最后，将融合的特征输入到基于不确定性的检测头中，生成对车辆边界盒的预测。当在无人机数据集上进行评估时，我们的MCIF达到了71.42%。mAP，优于既定基线方法7.42%。烧蚀研究进一步证明了LMAE在可见-红外融合中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mask-Guided Cross-Modality Fusion Network for Visible-Infrared Vehicle Detection

Drone-based vehicle detection is crucial for intelligent traffic management. However, current methods relying solely on single visible or infrared modalities struggle with precision and robustness, especially in adverse weather conditions. The effective integration of cross-modal information to enhance vehicle detection still poses significant challenges. In this letter, we propose a masked-guided cross-modality fusion method, called MCMF, for robust and accurate visible-infrared vehicle detection. Firstly, we construct a framework consisting of three branches, with two dedicated to the visible and infrared modalities respectively, and another tailored for the fused multi-modal. Secondly, we introduce a Location-Sensitive Masked AutoEncoder (LMAE) for intermediate-level feature fusion. Specifically, our LMAE utilizes masks to cover intermediate-level features of one modality based on the prediction hierarchy of another modality, and then distills cross-modality guidance information through regularization constraints. This strategy, through a self-learning paradigm, effectively preserves the useful information from both modalities while eliminating redundant information from each. Finally, the fused features are input into an uncertainty-based detection head to generate predictions for bounding boxes of vehicles. When evaluated on the DroneVehicle dataset, our MCIF reaches 71.42% w.r..t. mAP, outperforming an established baseline method by 7.42%. Ablation studies further demonstrate the effectiveness of our LMAE for visible-infrared fusion.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.