{"title":"MCFusion: RGB-T目标检测的频域特征增强和特征补偿融合网络","authors":"Yinbo Gao;Zhuhua Liao;Yizhi Liu;Aiping Yi;Guoqiang Zhang","doi":"10.1109/JSEN.2025.3559057","DOIUrl":null,"url":null,"abstract":"The multimodal object detection technology based on visible-thermal vision sensors has drawn significant attention as it is capable of achieving reliable object detection in complex scenes with challenging lighting conditions such as low light or backlight. However, there has been a lack of focus on the frequency domain feature information of the visible and thermal modalities themselves, as well as the complementarity of cross-modal features. Furthermore, current visible and thermal feature fusion methods only utilize feature information from the current layer, neglecting context information. Therefore, this article proposes a novel network framework for enhancing frequency domain characteristics and fusing cross-layer and cross-modal features to compensate for these limitations. This framework introduces two key modules: the frequency domain characteristics enhancement (FCE) module and the cross-layer and cross-modal feature compensation fusion (CCF) module. The FCE module consists of two sub-modules. Reduce high-frequency information loss (FCE-RHFL) module operates on the visible modality to reduce high-frequency information loss, utilizing methods such as high-frequency masking and frequency recombination. Meanwhile, enhance high-frequency information representation (FCE-EHFR) module enhances high-frequency information representation for the thermal modality through convolutions with different kernels and frequency domain enhancement techniques. In the CCF module, cross-modal and cross-layer feature compensation methods are employed to compensate for differences in modalities, followed by capture complementary information across modalities using a query-guided cross-attention mechanism. Finally, we conduct experimental comparisons on the KAIST and FLIR datasets, and the experimental results demonstrate that our method has excellent performance and robust detection results.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 11","pages":"20880-20893"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MCFusion: Frequency Domain Characteristics Enhancement and Feature Compensation Fusion Network for RGB-T Object Detection\",\"authors\":\"Yinbo Gao;Zhuhua Liao;Yizhi Liu;Aiping Yi;Guoqiang Zhang\",\"doi\":\"10.1109/JSEN.2025.3559057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The multimodal object detection technology based on visible-thermal vision sensors has drawn significant attention as it is capable of achieving reliable object detection in complex scenes with challenging lighting conditions such as low light or backlight. However, there has been a lack of focus on the frequency domain feature information of the visible and thermal modalities themselves, as well as the complementarity of cross-modal features. Furthermore, current visible and thermal feature fusion methods only utilize feature information from the current layer, neglecting context information. Therefore, this article proposes a novel network framework for enhancing frequency domain characteristics and fusing cross-layer and cross-modal features to compensate for these limitations. This framework introduces two key modules: the frequency domain characteristics enhancement (FCE) module and the cross-layer and cross-modal feature compensation fusion (CCF) module. The FCE module consists of two sub-modules. Reduce high-frequency information loss (FCE-RHFL) module operates on the visible modality to reduce high-frequency information loss, utilizing methods such as high-frequency masking and frequency recombination. Meanwhile, enhance high-frequency information representation (FCE-EHFR) module enhances high-frequency information representation for the thermal modality through convolutions with different kernels and frequency domain enhancement techniques. In the CCF module, cross-modal and cross-layer feature compensation methods are employed to compensate for differences in modalities, followed by capture complementary information across modalities using a query-guided cross-attention mechanism. Finally, we conduct experimental comparisons on the KAIST and FLIR datasets, and the experimental results demonstrate that our method has excellent performance and robust detection results.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":\"25 11\",\"pages\":\"20880-20893\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10965904/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10965904/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
基于可见热视觉传感器的多模态目标检测技术由于能够在弱光或背光等复杂光照条件下实现可靠的目标检测而备受关注。然而,对可见模态和热模态本身的频域特征信息以及跨模态特征的互补性缺乏关注。此外,当前的可见和热特征融合方法仅利用当前层的特征信息,而忽略了上下文信息。因此,本文提出了一种新的网络框架来增强频域特征,融合跨层和跨模态特征,以弥补这些局限性。该框架引入两个关键模块:频域特征增强(FCE)模块和跨层、跨模态特征补偿融合(CCF)模块。FCE模块由两个子模块组成。减少高频信息损失(FCE-RHFL)模块工作在可见模态上,利用高频掩蔽和频率重组等方法来减少高频信息损失。同时,高频信息增强表示(enhanced high-frequency information representation, FCE-EHFR)模块通过不同核卷积和频域增强技术增强热模态的高频信息表示。在CCF模块中,采用跨模态和跨层特征补偿方法来补偿模态差异,然后使用查询引导的交叉注意机制捕获模态间的互补信息。最后,我们在KAIST和FLIR数据集上进行了实验比较,实验结果表明我们的方法具有优异的性能和鲁棒性检测结果。
MCFusion: Frequency Domain Characteristics Enhancement and Feature Compensation Fusion Network for RGB-T Object Detection
The multimodal object detection technology based on visible-thermal vision sensors has drawn significant attention as it is capable of achieving reliable object detection in complex scenes with challenging lighting conditions such as low light or backlight. However, there has been a lack of focus on the frequency domain feature information of the visible and thermal modalities themselves, as well as the complementarity of cross-modal features. Furthermore, current visible and thermal feature fusion methods only utilize feature information from the current layer, neglecting context information. Therefore, this article proposes a novel network framework for enhancing frequency domain characteristics and fusing cross-layer and cross-modal features to compensate for these limitations. This framework introduces two key modules: the frequency domain characteristics enhancement (FCE) module and the cross-layer and cross-modal feature compensation fusion (CCF) module. The FCE module consists of two sub-modules. Reduce high-frequency information loss (FCE-RHFL) module operates on the visible modality to reduce high-frequency information loss, utilizing methods such as high-frequency masking and frequency recombination. Meanwhile, enhance high-frequency information representation (FCE-EHFR) module enhances high-frequency information representation for the thermal modality through convolutions with different kernels and frequency domain enhancement techniques. In the CCF module, cross-modal and cross-layer feature compensation methods are employed to compensate for differences in modalities, followed by capture complementary information across modalities using a query-guided cross-attention mechanism. Finally, we conduct experimental comparisons on the KAIST and FLIR datasets, and the experimental results demonstrate that our method has excellent performance and robust detection results.
期刊介绍:
The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following:
-Sensor Phenomenology, Modelling, and Evaluation
-Sensor Materials, Processing, and Fabrication
-Chemical and Gas Sensors
-Microfluidics and Biosensors
-Optical Sensors
-Physical Sensors: Temperature, Mechanical, Magnetic, and others
-Acoustic and Ultrasonic Sensors
-Sensor Packaging
-Sensor Networks
-Sensor Applications
-Sensor Systems: Signals, Processing, and Interfaces
-Actuators and Sensor Power Systems
-Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting
-Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data)
-Sensors in Industrial Practice