{"title":"红外可见成像中鲁棒多模态目标检测的自适应跨模态融合","authors":"Xiangping Wu, Bingxuan Zhang, Wangjun Wan","doi":"10.1007/s10043-025-00977-w","DOIUrl":null,"url":null,"abstract":"<p>Given the challenges faced by object detection methods that rely on visible light in complex environments, many researchers have begun to explore the combination of infrared and visible imaging for multi-modal detection. Existing results show that multi-modal fusion has proven effective for improving object detection outcomes. However, most current multi-modal detection methods rely on fixed-parameter feature fusion techniques, failing to account for the imaging differences across diverse environments and the complementary information between different modalities. In this paper, we propose a multi-modal object detection method based on adaptive weight fusion, utilizing the dual-stream framework to extract features from both modalities separately. We design a Cross-Modal Feature Interaction (CMFI) module to integrate global information across modalities and capture long-range dependencies. In addition, we introduce an Adaptive Modal Weight Calculation (AMWC) module, which fully accounts for the characteristics of different modalities in various environments and the complementarity among the modalities. This module dynamically adjusts the fusion weights within the CMFI module based on the input from different modalities. Moreover, a novel loss function is introduced to regulate the internal parameter adjustments of the AMWC module. We conduct extensive experiments on three representative datasets, using mAP@0.5 and mAP@0.5:0.95 as evaluation metrics. Our model achieved 79.1% and 40.6% on the FLIR dataset, 81.9% and 52.1% on M3FD, and 73.5% and 32.5% on KAIST.</p>","PeriodicalId":722,"journal":{"name":"Optical Review","volume":"76 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2025-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive cross-modal fusion for robust multi-modal object detection in infrared–visible imaging\",\"authors\":\"Xiangping Wu, Bingxuan Zhang, Wangjun Wan\",\"doi\":\"10.1007/s10043-025-00977-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Given the challenges faced by object detection methods that rely on visible light in complex environments, many researchers have begun to explore the combination of infrared and visible imaging for multi-modal detection. Existing results show that multi-modal fusion has proven effective for improving object detection outcomes. However, most current multi-modal detection methods rely on fixed-parameter feature fusion techniques, failing to account for the imaging differences across diverse environments and the complementary information between different modalities. In this paper, we propose a multi-modal object detection method based on adaptive weight fusion, utilizing the dual-stream framework to extract features from both modalities separately. We design a Cross-Modal Feature Interaction (CMFI) module to integrate global information across modalities and capture long-range dependencies. In addition, we introduce an Adaptive Modal Weight Calculation (AMWC) module, which fully accounts for the characteristics of different modalities in various environments and the complementarity among the modalities. This module dynamically adjusts the fusion weights within the CMFI module based on the input from different modalities. Moreover, a novel loss function is introduced to regulate the internal parameter adjustments of the AMWC module. We conduct extensive experiments on three representative datasets, using mAP@0.5 and mAP@0.5:0.95 as evaluation metrics. Our model achieved 79.1% and 40.6% on the FLIR dataset, 81.9% and 52.1% on M3FD, and 73.5% and 32.5% on KAIST.</p>\",\"PeriodicalId\":722,\"journal\":{\"name\":\"Optical Review\",\"volume\":\"76 1\",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2025-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Optical Review\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1007/s10043-025-00977-w\",\"RegionNum\":4,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OPTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optical Review","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1007/s10043-025-00977-w","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPTICS","Score":null,"Total":0}
Adaptive cross-modal fusion for robust multi-modal object detection in infrared–visible imaging
Given the challenges faced by object detection methods that rely on visible light in complex environments, many researchers have begun to explore the combination of infrared and visible imaging for multi-modal detection. Existing results show that multi-modal fusion has proven effective for improving object detection outcomes. However, most current multi-modal detection methods rely on fixed-parameter feature fusion techniques, failing to account for the imaging differences across diverse environments and the complementary information between different modalities. In this paper, we propose a multi-modal object detection method based on adaptive weight fusion, utilizing the dual-stream framework to extract features from both modalities separately. We design a Cross-Modal Feature Interaction (CMFI) module to integrate global information across modalities and capture long-range dependencies. In addition, we introduce an Adaptive Modal Weight Calculation (AMWC) module, which fully accounts for the characteristics of different modalities in various environments and the complementarity among the modalities. This module dynamically adjusts the fusion weights within the CMFI module based on the input from different modalities. Moreover, a novel loss function is introduced to regulate the internal parameter adjustments of the AMWC module. We conduct extensive experiments on three representative datasets, using mAP@0.5 and mAP@0.5:0.95 as evaluation metrics. Our model achieved 79.1% and 40.6% on the FLIR dataset, 81.9% and 52.1% on M3FD, and 73.5% and 32.5% on KAIST.
期刊介绍:
Optical Review is an international journal published by the Optical Society of Japan. The scope of the journal is:
General and physical optics;
Quantum optics and spectroscopy;
Information optics;
Photonics and optoelectronics;
Biomedical photonics and biological optics;
Lasers;
Nonlinear optics;
Optical systems and technologies;
Optical materials and manufacturing technologies;
Vision;
Infrared and short wavelength optics;
Cross-disciplinary areas such as environmental, energy, food, agriculture and space technologies;
Other optical methods and applications.