基于跨模态表示模型的RGB-T目标检测网络。

IF 2.1 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY

Entropy Pub Date : 2025-04-19 DOI:10.3390/e27040442

Yubin Li, Weida Zhan, Yichun Jiang, Jinxin Guo

{"title":"基于跨模态表示模型的RGB-T目标检测网络。","authors":"Yubin Li, Weida Zhan, Yichun Jiang, Jinxin Guo","doi":"10.3390/e27040442","DOIUrl":null,"url":null,"abstract":"RGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely registered data and insufficient handling of cross-modal distribution disparities. This paper presents RDCRNet, a novel framework incorporating a Cross-Modal Representation Model to effectively address these challenges. The proposed network features a Cross-Modal Feature Remapping Module that aligns modality distributions through statistical normalization and learnable correction parameters, significantly reducing feature discrepancies between modalities. A Cross-Modal Refinement and Interaction Module enables sophisticated bidirectional information exchange via trinity refinement for intra-modal context modeling and cross-attention mechanisms for unaligned feature fusion. Multiscale detection capability is enhanced through a Cross-Scale Feature Integration Module, improving detection performance across various object sizes. To overcome the inherent data scarcity in RGB-T detection, we introduce a self-supervised pretraining strategy that combines masked reconstruction with adversarial learning and semantic consistency loss, effectively leveraging both aligned and unaligned RGB-T samples. Extensive experiments demonstrate that RDCRNet achieves state-of-the-art performance on multiple benchmark datasets while maintaining high computational and storage efficiency, validating its superiority and practical effectiveness in real-world applications.","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 4","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12027132/pdf/","citationCount":"0","resultStr":"{\"title\":\"RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model.\",\"authors\":\"Yubin Li, Weida Zhan, Yichun Jiang, Jinxin Guo\",\"doi\":\"10.3390/e27040442\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely registered data and insufficient handling of cross-modal distribution disparities. This paper presents RDCRNet, a novel framework incorporating a Cross-Modal Representation Model to effectively address these challenges. The proposed network features a Cross-Modal Feature Remapping Module that aligns modality distributions through statistical normalization and learnable correction parameters, significantly reducing feature discrepancies between modalities. A Cross-Modal Refinement and Interaction Module enables sophisticated bidirectional information exchange via trinity refinement for intra-modal context modeling and cross-attention mechanisms for unaligned feature fusion. Multiscale detection capability is enhanced through a Cross-Scale Feature Integration Module, improving detection performance across various object sizes. To overcome the inherent data scarcity in RGB-T detection, we introduce a self-supervised pretraining strategy that combines masked reconstruction with adversarial learning and semantic consistency loss, effectively leveraging both aligned and unaligned RGB-T samples. Extensive experiments demonstrate that RDCRNet achieves state-of-the-art performance on multiple benchmark datasets while maintaining high computational and storage efficiency, validating its superiority and practical effectiveness in real-world applications.\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":\"27 4\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12027132/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e27040442\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27040442","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

rgb -热目标检测利用可见光和热模态的互补信息，增强了在具有挑战性的环境下，特别是在低光条件下的检测鲁棒性。然而，现有的方法由于严重依赖精确登记的数据和对跨模态分布差异处理不足而受到限制。本文提出了RDCRNet，这是一个结合跨模态表示模型的新框架，可以有效地解决这些挑战。所提出的网络具有跨模态特征重新映射模块，该模块通过统计归一化和可学习的校正参数来对齐模态分布，显著减少模态之间的特征差异。跨模态优化和交互模块通过模态内上下文建模的三位一体优化和非对齐特征融合的交叉注意机制实现复杂的双向信息交换。通过跨尺度特征集成模块增强了多尺度检测能力，提高了不同对象尺寸的检测性能。为了克服RGB-T检测中固有的数据稀缺性，我们引入了一种自监督预训练策略，该策略将屏蔽重建与对抗学习和语义一致性损失相结合，有效地利用对齐和未对齐的RGB-T样本。大量实验表明，RDCRNet在多个基准数据集上实现了最先进的性能，同时保持了较高的计算和存储效率，验证了其在实际应用中的优越性和实际有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model.

RGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely registered data and insufficient handling of cross-modal distribution disparities. This paper presents RDCRNet, a novel framework incorporating a Cross-Modal Representation Model to effectively address these challenges. The proposed network features a Cross-Modal Feature Remapping Module that aligns modality distributions through statistical normalization and learnable correction parameters, significantly reducing feature discrepancies between modalities. A Cross-Modal Refinement and Interaction Module enables sophisticated bidirectional information exchange via trinity refinement for intra-modal context modeling and cross-attention mechanisms for unaligned feature fusion. Multiscale detection capability is enhanced through a Cross-Scale Feature Integration Module, improving detection performance across various object sizes. To overcome the inherent data scarcity in RGB-T detection, we introduce a self-supervised pretraining strategy that combines masked reconstruction with adversarial learning and semantic consistency loss, effectively leveraging both aligned and unaligned RGB-T samples. Extensive experiments demonstrate that RDCRNet achieves state-of-the-art performance on multiple benchmark datasets while maintaining high computational and storage efficiency, validating its superiority and practical effectiveness in real-world applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Entropy PHYSICS, MULTIDISCIPLINARY-

CiteScore

4.90

自引率

11.10%

发文量

1580

审稿时长

21.05 days

期刊介绍： Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.