A collaborative spatial–frequency learning network for infrared and visible image fusion

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-09-26 DOI:10.1016/j.patcog.2025.112480

Hongbin Yu , Xiangcan Du , Wei Song , Haojie Zhou , Junyi Zhang

{"title":"A collaborative spatial–frequency learning network for infrared and visible image fusion","authors":"Hongbin Yu , Xiangcan Du , Wei Song , Haojie Zhou , Junyi Zhang","doi":"10.1016/j.patcog.2025.112480","DOIUrl":null,"url":null,"abstract":"<div><div>Most existing deep fusion models operate predominantly in the spatial domain, which limits their ability to effectively preserve texture details. In contrast, methods that incorporate frequency-domain information often suffer from inadequate interaction with spatial-domain features, thereby constraining overall fusion performance. To address these limitations, we propose a Collaborative Spatial-Frequency Learning Network (CSFNet) for infrared and visible image fusion. In the frequency-domain learning branch, we introduce a frequency refinement module based on wavelet transform to enable cross-band feature interaction and facilitate effective multi-scale feature fusion. In the spatial-domain branch, we embed a learnable low-rank decomposition model that extracts low-rank features from infrared images and sparse detail features from visible images, forming the basis of a dedicated spatial feature extraction module. Additionally, an information aggregation module is designed to learn complementary representations and integrate cross-domain features efficiently. To validate the effectiveness of the proposed approach, we conducted extensive experiments on three publicly available datasets: MSRS, TNO, and RoadScene, and compared CSFNet with sixteen state-of-the-art (SOTA) fusion methods. On the MSRS dataset, CSFNet achieved favorable results, with a mean and standard deviation of SF = 12.2108 <span><math><mo>±</mo></math></span> 3.8706, VIF = 1.0232 <span><math><mo>±</mo></math></span> 0.1397, Qabf = 0.7112 <span><math><mo>±</mo></math></span> 0.0397, SSIM = 0.6909 <span><math><mo>±</mo></math></span> 0.0859, PSNR = 17.6517 <span><math><mo>±</mo></math></span> 3.8767, and AG = 4.0243 <span><math><mo>±</mo></math></span> 1.5465. The minimum performance improvement over SOTA methods was 1.64 %, while the maximum gain reached 108.82 %. Furthermore, CSFNet demonstrated superior performance on a downstream semantic segmentation task.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112480"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325011434","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Most existing deep fusion models operate predominantly in the spatial domain, which limits their ability to effectively preserve texture details. In contrast, methods that incorporate frequency-domain information often suffer from inadequate interaction with spatial-domain features, thereby constraining overall fusion performance. To address these limitations, we propose a Collaborative Spatial-Frequency Learning Network (CSFNet) for infrared and visible image fusion. In the frequency-domain learning branch, we introduce a frequency refinement module based on wavelet transform to enable cross-band feature interaction and facilitate effective multi-scale feature fusion. In the spatial-domain branch, we embed a learnable low-rank decomposition model that extracts low-rank features from infrared images and sparse detail features from visible images, forming the basis of a dedicated spatial feature extraction module. Additionally, an information aggregation module is designed to learn complementary representations and integrate cross-domain features efficiently. To validate the effectiveness of the proposed approach, we conducted extensive experiments on three publicly available datasets: MSRS, TNO, and RoadScene, and compared CSFNet with sixteen state-of-the-art (SOTA) fusion methods. On the MSRS dataset, CSFNet achieved favorable results, with a mean and standard deviation of SF = 12.2108 

\pm

 3.8706, VIF = 1.0232 

\pm

 0.1397, Qabf = 0.7112 

\pm

 0.0397, SSIM = 0.6909 

\pm

 0.0859, PSNR = 17.6517 

\pm

 3.8767, and AG = 4.0243 

\pm

 1.5465. The minimum performance improvement over SOTA methods was 1.64 %, while the maximum gain reached 108.82 %. Furthermore, CSFNet demonstrated superior performance on a downstream semantic segmentation task.

查看原文本刊更多论文

红外与可见光图像融合的协同空频学习网络

大多数现有的深度融合模型主要在空间域内操作，这限制了它们有效保留纹理细节的能力。相比之下，结合频域信息的方法往往与空域特征的交互不足，从而限制了整体融合性能。为了解决这些限制，我们提出了一种用于红外和可见光图像融合的协同空间频率学习网络（CSFNet）。在频域学习分支中，我们引入了基于小波变换的频率细化模块，实现了跨频带特征交互，实现了有效的多尺度特征融合。在空域分支中，我们嵌入了一个可学习的低秩分解模型，从红外图像中提取低秩特征，从可见光图像中提取稀疏细节特征，形成了专用的空间特征提取模块的基础。此外，设计了信息聚合模块，学习互补表示，有效地整合跨域特征。为了验证所提出方法的有效性，我们在三个公开可用的数据集上进行了广泛的实验：MSRS、TNO和RoadScene，并将CSFNet与16种最先进的（SOTA）融合方法进行了比较。msr数据集,CSFNet取得良好的成果,科幻的平均值和标准偏差= 12.2108 ±3.8706 ,VIF = 1.0232 ±0.1397 ,Qabf = 0.7112 ±0.0397 ,SSIM = 0.6909 ±0.0859 ,PSNR值= 17.6517 ±3.8767 ,和AG) = 4.0243 ±1.5465 。与SOTA方法相比，该方法的最小性能改进为1.64%，最大增益达到108.82%。此外，CSFNet在下游语义分割任务上表现出优异的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.