Multimodal object detection method based on bidirectional dynamic sampling and adaptive Cross-Modal fusion

IF 5 2区 物理与天体物理 Q1 OPTICS
Lei Zhang , Keyan Dong , Yansong Song , Gong Zhang , Gangqi Yan , Tianci Liu , Yanbo Wang , Yuqing Li , Xinhang Li
{"title":"Multimodal object detection method based on bidirectional dynamic sampling and adaptive Cross-Modal fusion","authors":"Lei Zhang ,&nbsp;Keyan Dong ,&nbsp;Yansong Song ,&nbsp;Gong Zhang ,&nbsp;Gangqi Yan ,&nbsp;Tianci Liu ,&nbsp;Yanbo Wang ,&nbsp;Yuqing Li ,&nbsp;Xinhang Li","doi":"10.1016/j.optlastec.2025.113996","DOIUrl":null,"url":null,"abstract":"<div><div>The integration of visible (RGB) and infrared (IR) images have garnered extensive attention in fields such as object detection, object tracking, and scene segmentation. The complementarity of the perceptual information of these two modalities enables robust detection in all-weather and complex environments. Existing RGB-IR fusion methods have made notable progress in enhancing detection accuracy but still face problems such as modality alignment errors and insufficient feature fusion, which severely undermine detection performance. To address these challenges, we propose a model named Dynamic Communication Transformer (DynaComFormer), which can simultaneously resolve the problems of modality alignment errors and insufficient fusion precision. Within DynaComFormer, we have designed two modules: the Bidirectional Adaptive Sampling Module (BASM) and the Cross-Complementary Fusion Module (C<sup>2</sup>Fusion module). The BASM enhances the feature alignment accuracy between modalities through a dynamically guided sampling strategy while effectively reducing computational complexity. The C<sup>2</sup>Fusion module leverages self-attention mechanisms to establish efficient information interaction channels between two modalities, achieving complementary fusion of deep semantic features. We conducted experimental analyses in complex environments, such as rainy weather, strong light, and smoke. The results demonstrate that our detection performance surpasses other fusion algorithms and non-fusion algorithms by 2% to 10%.The code is available at https://github.com/alei147258/DynaComFormer-main.</div></div>","PeriodicalId":19511,"journal":{"name":"Optics and Laser Technology","volume":"192 ","pages":"Article 113996"},"PeriodicalIF":5.0000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optics and Laser Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0030399225015877","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPTICS","Score":null,"Total":0}
引用次数: 0

Abstract

The integration of visible (RGB) and infrared (IR) images have garnered extensive attention in fields such as object detection, object tracking, and scene segmentation. The complementarity of the perceptual information of these two modalities enables robust detection in all-weather and complex environments. Existing RGB-IR fusion methods have made notable progress in enhancing detection accuracy but still face problems such as modality alignment errors and insufficient feature fusion, which severely undermine detection performance. To address these challenges, we propose a model named Dynamic Communication Transformer (DynaComFormer), which can simultaneously resolve the problems of modality alignment errors and insufficient fusion precision. Within DynaComFormer, we have designed two modules: the Bidirectional Adaptive Sampling Module (BASM) and the Cross-Complementary Fusion Module (C2Fusion module). The BASM enhances the feature alignment accuracy between modalities through a dynamically guided sampling strategy while effectively reducing computational complexity. The C2Fusion module leverages self-attention mechanisms to establish efficient information interaction channels between two modalities, achieving complementary fusion of deep semantic features. We conducted experimental analyses in complex environments, such as rainy weather, strong light, and smoke. The results demonstrate that our detection performance surpasses other fusion algorithms and non-fusion algorithms by 2% to 10%.The code is available at https://github.com/alei147258/DynaComFormer-main.
基于双向动态采样和自适应跨模态融合的多模态目标检测方法
可见光(RGB)和红外(IR)图像的融合在目标检测、目标跟踪和场景分割等领域受到了广泛关注。这两种模式的感知信息的互补性使得在全天候和复杂的环境中进行稳健的检测。现有的RGB-IR融合方法在提高检测精度方面取得了显著进展,但仍然存在模态对准误差和特征融合不足等问题,严重影响了检测性能。为了解决这些问题,我们提出了一种动态通信变压器(DynaComFormer)模型,该模型可以同时解决模态对准误差和融合精度不足的问题。在DynaComFormer中,我们设计了两个模块:双向自适应采样模块(BASM)和交叉互补融合模块(C2Fusion模块)。该方法通过动态引导采样策略提高了模态之间的特征对齐精度,同时有效地降低了计算复杂度。C2Fusion模块利用自注意机制在两种模态之间建立高效的信息交互通道,实现深层语义特征的互补融合。我们在阴雨天气、强光、烟雾等复杂环境下进行了实验分析。结果表明,我们的检测性能比其他融合算法和非融合算法高出2% ~ 10%。代码可在https://github.com/alei147258/DynaComFormer-main上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.50
自引率
10.00%
发文量
1060
审稿时长
3.4 months
期刊介绍: Optics & Laser Technology aims to provide a vehicle for the publication of a broad range of high quality research and review papers in those fields of scientific and engineering research appertaining to the development and application of the technology of optics and lasers. Papers describing original work in these areas are submitted to rigorous refereeing prior to acceptance for publication. The scope of Optics & Laser Technology encompasses, but is not restricted to, the following areas: •development in all types of lasers •developments in optoelectronic devices and photonics •developments in new photonics and optical concepts •developments in conventional optics, optical instruments and components •techniques of optical metrology, including interferometry and optical fibre sensors •LIDAR and other non-contact optical measurement techniques, including optical methods in heat and fluid flow •applications of lasers to materials processing, optical NDT display (including holography) and optical communication •research and development in the field of laser safety including studies of hazards resulting from the applications of lasers (laser safety, hazards of laser fume) •developments in optical computing and optical information processing •developments in new optical materials •developments in new optical characterization methods and techniques •developments in quantum optics •developments in light assisted micro and nanofabrication methods and techniques •developments in nanophotonics and biophotonics •developments in imaging processing and systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信