Lei Zhang , Keyan Dong , Yansong Song , Gong Zhang , Gangqi Yan , Tianci Liu , Yanbo Wang , Yuqing Li , Xinhang Li
{"title":"基于双向动态采样和自适应跨模态融合的多模态目标检测方法","authors":"Lei Zhang , Keyan Dong , Yansong Song , Gong Zhang , Gangqi Yan , Tianci Liu , Yanbo Wang , Yuqing Li , Xinhang Li","doi":"10.1016/j.optlastec.2025.113996","DOIUrl":null,"url":null,"abstract":"<div><div>The integration of visible (RGB) and infrared (IR) images have garnered extensive attention in fields such as object detection, object tracking, and scene segmentation. The complementarity of the perceptual information of these two modalities enables robust detection in all-weather and complex environments. Existing RGB-IR fusion methods have made notable progress in enhancing detection accuracy but still face problems such as modality alignment errors and insufficient feature fusion, which severely undermine detection performance. To address these challenges, we propose a model named Dynamic Communication Transformer (DynaComFormer), which can simultaneously resolve the problems of modality alignment errors and insufficient fusion precision. Within DynaComFormer, we have designed two modules: the Bidirectional Adaptive Sampling Module (BASM) and the Cross-Complementary Fusion Module (C<sup>2</sup>Fusion module). The BASM enhances the feature alignment accuracy between modalities through a dynamically guided sampling strategy while effectively reducing computational complexity. The C<sup>2</sup>Fusion module leverages self-attention mechanisms to establish efficient information interaction channels between two modalities, achieving complementary fusion of deep semantic features. We conducted experimental analyses in complex environments, such as rainy weather, strong light, and smoke. The results demonstrate that our detection performance surpasses other fusion algorithms and non-fusion algorithms by 2% to 10%.The code is available at https://github.com/alei147258/DynaComFormer-main.</div></div>","PeriodicalId":19511,"journal":{"name":"Optics and Laser Technology","volume":"192 ","pages":"Article 113996"},"PeriodicalIF":5.0000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal object detection method based on bidirectional dynamic sampling and adaptive Cross-Modal fusion\",\"authors\":\"Lei Zhang , Keyan Dong , Yansong Song , Gong Zhang , Gangqi Yan , Tianci Liu , Yanbo Wang , Yuqing Li , Xinhang Li\",\"doi\":\"10.1016/j.optlastec.2025.113996\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The integration of visible (RGB) and infrared (IR) images have garnered extensive attention in fields such as object detection, object tracking, and scene segmentation. The complementarity of the perceptual information of these two modalities enables robust detection in all-weather and complex environments. Existing RGB-IR fusion methods have made notable progress in enhancing detection accuracy but still face problems such as modality alignment errors and insufficient feature fusion, which severely undermine detection performance. To address these challenges, we propose a model named Dynamic Communication Transformer (DynaComFormer), which can simultaneously resolve the problems of modality alignment errors and insufficient fusion precision. Within DynaComFormer, we have designed two modules: the Bidirectional Adaptive Sampling Module (BASM) and the Cross-Complementary Fusion Module (C<sup>2</sup>Fusion module). The BASM enhances the feature alignment accuracy between modalities through a dynamically guided sampling strategy while effectively reducing computational complexity. The C<sup>2</sup>Fusion module leverages self-attention mechanisms to establish efficient information interaction channels between two modalities, achieving complementary fusion of deep semantic features. We conducted experimental analyses in complex environments, such as rainy weather, strong light, and smoke. The results demonstrate that our detection performance surpasses other fusion algorithms and non-fusion algorithms by 2% to 10%.The code is available at https://github.com/alei147258/DynaComFormer-main.</div></div>\",\"PeriodicalId\":19511,\"journal\":{\"name\":\"Optics and Laser Technology\",\"volume\":\"192 \",\"pages\":\"Article 113996\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Optics and Laser Technology\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0030399225015877\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optics and Laser Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0030399225015877","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPTICS","Score":null,"Total":0}
Multimodal object detection method based on bidirectional dynamic sampling and adaptive Cross-Modal fusion
The integration of visible (RGB) and infrared (IR) images have garnered extensive attention in fields such as object detection, object tracking, and scene segmentation. The complementarity of the perceptual information of these two modalities enables robust detection in all-weather and complex environments. Existing RGB-IR fusion methods have made notable progress in enhancing detection accuracy but still face problems such as modality alignment errors and insufficient feature fusion, which severely undermine detection performance. To address these challenges, we propose a model named Dynamic Communication Transformer (DynaComFormer), which can simultaneously resolve the problems of modality alignment errors and insufficient fusion precision. Within DynaComFormer, we have designed two modules: the Bidirectional Adaptive Sampling Module (BASM) and the Cross-Complementary Fusion Module (C2Fusion module). The BASM enhances the feature alignment accuracy between modalities through a dynamically guided sampling strategy while effectively reducing computational complexity. The C2Fusion module leverages self-attention mechanisms to establish efficient information interaction channels between two modalities, achieving complementary fusion of deep semantic features. We conducted experimental analyses in complex environments, such as rainy weather, strong light, and smoke. The results demonstrate that our detection performance surpasses other fusion algorithms and non-fusion algorithms by 2% to 10%.The code is available at https://github.com/alei147258/DynaComFormer-main.
期刊介绍:
Optics & Laser Technology aims to provide a vehicle for the publication of a broad range of high quality research and review papers in those fields of scientific and engineering research appertaining to the development and application of the technology of optics and lasers. Papers describing original work in these areas are submitted to rigorous refereeing prior to acceptance for publication.
The scope of Optics & Laser Technology encompasses, but is not restricted to, the following areas:
•development in all types of lasers
•developments in optoelectronic devices and photonics
•developments in new photonics and optical concepts
•developments in conventional optics, optical instruments and components
•techniques of optical metrology, including interferometry and optical fibre sensors
•LIDAR and other non-contact optical measurement techniques, including optical methods in heat and fluid flow
•applications of lasers to materials processing, optical NDT display (including holography) and optical communication
•research and development in the field of laser safety including studies of hazards resulting from the applications of lasers (laser safety, hazards of laser fume)
•developments in optical computing and optical information processing
•developments in new optical materials
•developments in new optical characterization methods and techniques
•developments in quantum optics
•developments in light assisted micro and nanofabrication methods and techniques
•developments in nanophotonics and biophotonics
•developments in imaging processing and systems