基于双向动态采样和自适应跨模态融合的多模态目标检测方法

IF 5 2区物理与天体物理 Q1 OPTICS

Optics and Laser Technology Pub Date : 2025-09-26 DOI:10.1016/j.optlastec.2025.113996

Lei Zhang , Keyan Dong , Yansong Song , Gong Zhang , Gangqi Yan , Tianci Liu , Yanbo Wang , Yuqing Li , Xinhang Li

{"title":"基于双向动态采样和自适应跨模态融合的多模态目标检测方法","authors":"Lei Zhang , Keyan Dong , Yansong Song , Gong Zhang , Gangqi Yan , Tianci Liu , Yanbo Wang , Yuqing Li , Xinhang Li","doi":"10.1016/j.optlastec.2025.113996","DOIUrl":null,"url":null,"abstract":"<div><div>The integration of visible (RGB) and infrared (IR) images have garnered extensive attention in fields such as object detection, object tracking, and scene segmentation. The complementarity of the perceptual information of these two modalities enables robust detection in all-weather and complex environments. Existing RGB-IR fusion methods have made notable progress in enhancing detection accuracy but still face problems such as modality alignment errors and insufficient feature fusion, which severely undermine detection performance. To address these challenges, we propose a model named Dynamic Communication Transformer (DynaComFormer), which can simultaneously resolve the problems of modality alignment errors and insufficient fusion precision. Within DynaComFormer, we have designed two modules: the Bidirectional Adaptive Sampling Module (BASM) and the Cross-Complementary Fusion Module (C<sup>2</sup>Fusion module). The BASM enhances the feature alignment accuracy between modalities through a dynamically guided sampling strategy while effectively reducing computational complexity. The C<sup>2</sup>Fusion module leverages self-attention mechanisms to establish efficient information interaction channels between two modalities, achieving complementary fusion of deep semantic features. We conducted experimental analyses in complex environments, such as rainy weather, strong light, and smoke. The results demonstrate that our detection performance surpasses other fusion algorithms and non-fusion algorithms by 2% to 10%.The code is available at https://github.com/alei147258/DynaComFormer-main.</div></div>","PeriodicalId":19511,"journal":{"name":"Optics and Laser Technology","volume":"192 ","pages":"Article 113996"},"PeriodicalIF":5.0000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal object detection method based on bidirectional dynamic sampling and adaptive Cross-Modal fusion\",\"authors\":\"Lei Zhang , Keyan Dong , Yansong Song , Gong Zhang , Gangqi Yan , Tianci Liu , Yanbo Wang , Yuqing Li , Xinhang Li\",\"doi\":\"10.1016/j.optlastec.2025.113996\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The integration of visible (RGB) and infrared (IR) images have garnered extensive attention in fields such as object detection, object tracking, and scene segmentation. The complementarity of the perceptual information of these two modalities enables robust detection in all-weather and complex environments. Existing RGB-IR fusion methods have made notable progress in enhancing detection accuracy but still face problems such as modality alignment errors and insufficient feature fusion, which severely undermine detection performance. To address these challenges, we propose a model named Dynamic Communication Transformer (DynaComFormer), which can simultaneously resolve the problems of modality alignment errors and insufficient fusion precision. Within DynaComFormer, we have designed two modules: the Bidirectional Adaptive Sampling Module (BASM) and the Cross-Complementary Fusion Module (C<sup>2</sup>Fusion module). The BASM enhances the feature alignment accuracy between modalities through a dynamically guided sampling strategy while effectively reducing computational complexity. The C<sup>2</sup>Fusion module leverages self-attention mechanisms to establish efficient information interaction channels between two modalities, achieving complementary fusion of deep semantic features. We conducted experimental analyses in complex environments, such as rainy weather, strong light, and smoke. The results demonstrate that our detection performance surpasses other fusion algorithms and non-fusion algorithms by 2% to 10%.The code is available at https://github.com/alei147258/DynaComFormer-main.</div></div>\",\"PeriodicalId\":19511,\"journal\":{\"name\":\"Optics and Laser Technology\",\"volume\":\"192 \",\"pages\":\"Article 113996\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Optics and Laser Technology\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0030399225015877\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optics and Laser Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0030399225015877","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPTICS","Score":null,"Total":0}

引用次数: 0

摘要

可见光（RGB）和红外（IR）图像的融合在目标检测、目标跟踪和场景分割等领域受到了广泛关注。这两种模式的感知信息的互补性使得在全天候和复杂的环境中进行稳健的检测。现有的RGB-IR融合方法在提高检测精度方面取得了显著进展，但仍然存在模态对准误差和特征融合不足等问题，严重影响了检测性能。为了解决这些问题，我们提出了一种动态通信变压器（DynaComFormer）模型，该模型可以同时解决模态对准误差和融合精度不足的问题。在DynaComFormer中，我们设计了两个模块：双向自适应采样模块（BASM）和交叉互补融合模块（C2Fusion模块）。该方法通过动态引导采样策略提高了模态之间的特征对齐精度，同时有效地降低了计算复杂度。C2Fusion模块利用自注意机制在两种模态之间建立高效的信息交互通道，实现深层语义特征的互补融合。我们在阴雨天气、强光、烟雾等复杂环境下进行了实验分析。结果表明，我们的检测性能比其他融合算法和非融合算法高出2% ~ 10%。代码可在https://github.com/alei147258/DynaComFormer-main上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multimodal object detection method based on bidirectional dynamic sampling and adaptive Cross-Modal fusion

The integration of visible (RGB) and infrared (IR) images have garnered extensive attention in fields such as object detection, object tracking, and scene segmentation. The complementarity of the perceptual information of these two modalities enables robust detection in all-weather and complex environments. Existing RGB-IR fusion methods have made notable progress in enhancing detection accuracy but still face problems such as modality alignment errors and insufficient feature fusion, which severely undermine detection performance. To address these challenges, we propose a model named Dynamic Communication Transformer (DynaComFormer), which can simultaneously resolve the problems of modality alignment errors and insufficient fusion precision. Within DynaComFormer, we have designed two modules: the Bidirectional Adaptive Sampling Module (BASM) and the Cross-Complementary Fusion Module (C²Fusion module). The BASM enhances the feature alignment accuracy between modalities through a dynamically guided sampling strategy while effectively reducing computational complexity. The C²Fusion module leverages self-attention mechanisms to establish efficient information interaction channels between two modalities, achieving complementary fusion of deep semantic features. We conducted experimental analyses in complex environments, such as rainy weather, strong light, and smoke. The results demonstrate that our detection performance surpasses other fusion algorithms and non-fusion algorithms by 2% to 10%.The code is available at https://github.com/alei147258/DynaComFormer-main.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Optics and Laser Technology 物理-光学

CiteScore

8.50

自引率

10.00%

发文量

1060

审稿时长

3.4 months

期刊介绍： Optics & Laser Technology aims to provide a vehicle for the publication of a broad range of high quality research and review papers in those fields of scientific and engineering research appertaining to the development and application of the technology of optics and lasers. Papers describing original work in these areas are submitted to rigorous refereeing prior to acceptance for publication. The scope of Optics & Laser Technology encompasses, but is not restricted to, the following areas: •development in all types of lasers •developments in optoelectronic devices and photonics •developments in new photonics and optical concepts •developments in conventional optics, optical instruments and components •techniques of optical metrology, including interferometry and optical fibre sensors •LIDAR and other non-contact optical measurement techniques, including optical methods in heat and fluid flow •applications of lasers to materials processing, optical NDT display (including holography) and optical communication •research and development in the field of laser safety including studies of hazards resulting from the applications of lasers (laser safety, hazards of laser fume) •developments in optical computing and optical information processing •developments in new optical materials •developments in new optical characterization methods and techniques •developments in quantum optics •developments in light assisted micro and nanofabrication methods and techniques •developments in nanophotonics and biophotonics •developments in imaging processing and systems