语义约束的多尺度红外和可见光图像融合网络

IF 4.6 2区物理与天体物理 Q1 OPTICS

Optics and Laser Technology Pub Date : 2025-05-19 DOI:10.1016/j.optlastec.2025.113097

Liuyan Shi , Rencan Nie , Jinde Cao , Xuheng Liu , Xiaoli Li

{"title":"语义约束的多尺度红外和可见光图像融合网络","authors":"Liuyan Shi , Rencan Nie , Jinde Cao , Xuheng Liu , Xiaoli Li","doi":"10.1016/j.optlastec.2025.113097","DOIUrl":null,"url":null,"abstract":"<div><div>Due to the distinct characteristics of infrared and visible images, we introduce Semantic Constrained Multi-Scale Fusion (SCMFusion) to balance unique and common features during infrared and visible image fusion (IVIF). This method reduces redundancy and comprehensively represents scenes captured by both modalities. Firstly, the semantic-constrained Frequency-Aware Bidirectional Pyramid (FABP) combines a spatial pyramid, which vertically keeps the channel unchanged and captures a larger receptive field through resolution reduction, with a channel pyramid, which preserves scale consistency and enriches feature expression through increased channels. Subsequently, the extracted features undergo Semantic-Constrained Cross-Modal Cross-Scale Fusion (SC-CCF) for effective information exchange and fusion. Next, the semantic constraints ensure pixel-wise alignment between fused features and original images, integrating modality-specific features and enhancing shared features. Finally, a Reconstruction Block (RB) processes high- and low-frequency components to produce the fused image. Comparative experiments demonstrate that our model outperforms 11 state-of-the-art (SOTA) fusion methods and achieves notable results in object detection.</div></div>","PeriodicalId":19511,"journal":{"name":"Optics and Laser Technology","volume":"190 ","pages":"Article 113097"},"PeriodicalIF":4.6000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SCMFusion: Semantic Constrained Multi-Scale Fusion Network for infrared and visible image fusion\",\"authors\":\"Liuyan Shi , Rencan Nie , Jinde Cao , Xuheng Liu , Xiaoli Li\",\"doi\":\"10.1016/j.optlastec.2025.113097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Due to the distinct characteristics of infrared and visible images, we introduce Semantic Constrained Multi-Scale Fusion (SCMFusion) to balance unique and common features during infrared and visible image fusion (IVIF). This method reduces redundancy and comprehensively represents scenes captured by both modalities. Firstly, the semantic-constrained Frequency-Aware Bidirectional Pyramid (FABP) combines a spatial pyramid, which vertically keeps the channel unchanged and captures a larger receptive field through resolution reduction, with a channel pyramid, which preserves scale consistency and enriches feature expression through increased channels. Subsequently, the extracted features undergo Semantic-Constrained Cross-Modal Cross-Scale Fusion (SC-CCF) for effective information exchange and fusion. Next, the semantic constraints ensure pixel-wise alignment between fused features and original images, integrating modality-specific features and enhancing shared features. Finally, a Reconstruction Block (RB) processes high- and low-frequency components to produce the fused image. Comparative experiments demonstrate that our model outperforms 11 state-of-the-art (SOTA) fusion methods and achieves notable results in object detection.</div></div>\",\"PeriodicalId\":19511,\"journal\":{\"name\":\"Optics and Laser Technology\",\"volume\":\"190 \",\"pages\":\"Article 113097\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Optics and Laser Technology\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0030399225006887\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optics and Laser Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0030399225006887","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPTICS","Score":null,"Total":0}

引用次数: 0

摘要

针对红外图像与可见光图像的不同特征，引入语义约束多尺度融合（Semantic Constrained Multi-Scale Fusion, SCMFusion）来平衡红外图像与可见光图像融合（IVIF）过程中的共性与共性。该方法减少了冗余，并全面表征了两种模式捕获的场景。首先，基于语义约束的频率感知双向金字塔（FABP）将空间金字塔与通道金字塔相结合，空间金字塔在垂直方向上保持通道不变并通过分辨率降低捕获更大的接受场，通道金字塔通过增加通道来保持尺度一致性并丰富特征表达。随后，对提取的特征进行语义约束跨模态跨尺度融合（SC-CCF），实现有效的信息交换和融合。其次，语义约束确保融合特征和原始图像之间的像素对齐，集成特定于模态的特征并增强共享特征。最后，重构块（RB）处理高频和低频分量产生融合图像。对比实验表明，该模型优于11种最先进的SOTA融合方法，在目标检测方面取得了显著的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SCMFusion: Semantic Constrained Multi-Scale Fusion Network for infrared and visible image fusion

Due to the distinct characteristics of infrared and visible images, we introduce Semantic Constrained Multi-Scale Fusion (SCMFusion) to balance unique and common features during infrared and visible image fusion (IVIF). This method reduces redundancy and comprehensively represents scenes captured by both modalities. Firstly, the semantic-constrained Frequency-Aware Bidirectional Pyramid (FABP) combines a spatial pyramid, which vertically keeps the channel unchanged and captures a larger receptive field through resolution reduction, with a channel pyramid, which preserves scale consistency and enriches feature expression through increased channels. Subsequently, the extracted features undergo Semantic-Constrained Cross-Modal Cross-Scale Fusion (SC-CCF) for effective information exchange and fusion. Next, the semantic constraints ensure pixel-wise alignment between fused features and original images, integrating modality-specific features and enhancing shared features. Finally, a Reconstruction Block (RB) processes high- and low-frequency components to produce the fused image. Comparative experiments demonstrate that our model outperforms 11 state-of-the-art (SOTA) fusion methods and achieves notable results in object detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Optics and Laser Technology 物理-光学

CiteScore

8.50

自引率

10.00%

发文量

1060

审稿时长

3.4 months

期刊介绍： Optics & Laser Technology aims to provide a vehicle for the publication of a broad range of high quality research and review papers in those fields of scientific and engineering research appertaining to the development and application of the technology of optics and lasers. Papers describing original work in these areas are submitted to rigorous refereeing prior to acceptance for publication. The scope of Optics & Laser Technology encompasses, but is not restricted to, the following areas: •development in all types of lasers •developments in optoelectronic devices and photonics •developments in new photonics and optical concepts •developments in conventional optics, optical instruments and components •techniques of optical metrology, including interferometry and optical fibre sensors •LIDAR and other non-contact optical measurement techniques, including optical methods in heat and fluid flow •applications of lasers to materials processing, optical NDT display (including holography) and optical communication •research and development in the field of laser safety including studies of hazards resulting from the applications of lasers (laser safety, hazards of laser fume) •developments in optical computing and optical information processing •developments in new optical materials •developments in new optical characterization methods and techniques •developments in quantum optics •developments in light assisted micro and nanofabrication methods and techniques •developments in nanophotonics and biophotonics •developments in imaging processing and systems