One-stream one-stage pure transformer for infrared and visible image fusion

IF 3.4 3区物理与天体物理 Q2 INSTRUMENTS & INSTRUMENTATION

Infrared Physics & Technology Pub Date : 2025-08-18 DOI:10.1016/j.infrared.2025.106067

Qianhong Zhang , Qiao Liu , Di Yuan , Yunpeng Liu

{"title":"One-stream one-stage pure transformer for infrared and visible image fusion","authors":"Qianhong Zhang , Qiao Liu , Di Yuan , Yunpeng Liu","doi":"10.1016/j.infrared.2025.106067","DOIUrl":null,"url":null,"abstract":"<div><div>Existing infrared and visible image fusion methods usually use two same backbones to extract deep features of the source images, and then manually design a fusion strategy to fuse them. However, this framework overlooks the importance of feature interaction during the feature extraction stage and does not fully capture the complementary information between the source images. In addition, it requires the design of complex fusion strategies, which limits its robustness and generalization to different scenarios. To this end, we propose a one-stream one-stage pure Transformer-based fusion framework, which simplifies feature extraction and fusion into a unified one-stream pipeline. Specifically, the proposed method consists of a one-stream fusion network and a decomposition network. The fusion network uses several Swin Transformer blocks to extract and fuse two modality features simultaneously. Thanks to the sliding-window-based multi-head attention, the fusion network can acquire local features and global dependencies and seamlessly model their contextual relationships. Due to the lack of effective supervision signals, the fusion network struggles to fully transfer important information from the source images, which can easily lead to the generation of artifacts. To eliminate these artifacts and simultaneously force the fused image to contain richer information, we design a simple decomposition network that decomposes the fusion result into the source images with consistency constraints. Extensive comparative and ablation experiments on four image fusion benchmarks demonstrate that our method achieves favorable results. In addition, the results on downstream tasks, including object detection and semantic segmentation, further show the effectiveness of the proposed method.</div></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":"151 ","pages":"Article 106067"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350449525003603","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}

引用次数: 0

Abstract

Existing infrared and visible image fusion methods usually use two same backbones to extract deep features of the source images, and then manually design a fusion strategy to fuse them. However, this framework overlooks the importance of feature interaction during the feature extraction stage and does not fully capture the complementary information between the source images. In addition, it requires the design of complex fusion strategies, which limits its robustness and generalization to different scenarios. To this end, we propose a one-stream one-stage pure Transformer-based fusion framework, which simplifies feature extraction and fusion into a unified one-stream pipeline. Specifically, the proposed method consists of a one-stream fusion network and a decomposition network. The fusion network uses several Swin Transformer blocks to extract and fuse two modality features simultaneously. Thanks to the sliding-window-based multi-head attention, the fusion network can acquire local features and global dependencies and seamlessly model their contextual relationships. Due to the lack of effective supervision signals, the fusion network struggles to fully transfer important information from the source images, which can easily lead to the generation of artifacts. To eliminate these artifacts and simultaneously force the fused image to contain richer information, we design a simple decomposition network that decomposes the fusion result into the source images with consistency constraints. Extensive comparative and ablation experiments on four image fusion benchmarks demonstrate that our method achieves favorable results. In addition, the results on downstream tasks, including object detection and semantic segmentation, further show the effectiveness of the proposed method.

查看原文本刊更多论文

用于红外和可见光图像融合的一流一级纯变压器

现有的红外和可见光图像融合方法通常使用两个相同的主干提取源图像的深层特征，然后人工设计融合策略将它们融合在一起。然而，该框架忽略了特征提取阶段特征交互的重要性，并且没有充分捕获源图像之间的互补信息。此外，它需要设计复杂的融合策略，这限制了它对不同场景的鲁棒性和泛化能力。为此，我们提出了一种基于单流一级纯transformer的融合框架，将特征提取和融合简化为统一的单流管道。具体来说，该方法由一个单流融合网络和一个分解网络组成。融合网络使用多个Swin Transformer块同时提取和融合两个模态特征。由于基于滑动窗口的多头注意力，融合网络可以获取局部特征和全局依赖关系，并无缝地建模它们的上下文关系。由于缺乏有效的监督信号，融合网络难以从源图像中充分传递重要信息，容易导致伪影的产生。为了消除这些伪影，同时迫使融合图像包含更丰富的信息，我们设计了一个简单的分解网络，将融合结果分解为具有一致性约束的源图像。在四个图像融合基准上进行了大量的对比和消融实验，结果表明我们的方法取得了良好的效果。此外，在下游任务（包括目标检测和语义分割）上的结果进一步证明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Infrared Physics & Technology 物理-光学

CiteScore

5.70

自引率

12.10%

发文量

400

审稿时长

67 days

期刊介绍： The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region. Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine. Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.