FSCMF：一种用于可见光和红外图像融合的双分支频率-空间联合感知跨模态网络

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-05-10 DOI:10.1016/j.neucom.2025.130376

Xiaoyang Zhang , Chengpei Xu , Guodong Fan , Zhen Hua , Jinjiang Li , Jingchun Zhou

{"title":"FSCMF：一种用于可见光和红外图像融合的双分支频率-空间联合感知跨模态网络","authors":"Xiaoyang Zhang , Chengpei Xu , Guodong Fan , Zhen Hua , Jinjiang Li , Jingchun Zhou","doi":"10.1016/j.neucom.2025.130376","DOIUrl":null,"url":null,"abstract":"<div><div>Existing image fusion methods face limitations in deep feature modeling and high-frequency information enhancement, leading to detail loss and reduced target saliency in complex scenarios. To address these issues, this paper proposes a Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network (FSCMF), which integrates local details, global context, and frequency-domain information through a dual-branch architecture to enhance multimodal feature complementarity. Specifically, FSCMF combines CNN and Transformer in a dual-branch design, where the CNN branch focuses on extracting local structures and texture details, while the Transformer branch captures long-range dependencies to improve global consistency. To further optimize feature representation, we introduce the Frequency-Spatial Adaptive Attention Module (FSAA), in which the frequency domain branch enhances high-frequency components to improve edge sharpness, while the spatial domain branch adaptively refines salient region features, ensuring a dynamic balance between global and local information. Additionally, we propose the Weighted Cross-Spectral Feature Fusion Module (WCSFF) to enhance cross-modality feature interaction through adaptive weighting, thereby improving detail integrity and semantic consistency in the fused image. A maximum frequency loss function is further incorporated to ensure the preservation of critical frequency components. Extensive experiments on three public datasets — MSRS, M<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>FD, and LLVIP — demonstrate that FSCMF outperforms existing methods in both qualitative and quantitative evaluations, producing fusion results with higher visual consistency and better information retention. Furthermore, additional experiments on object detection and semantic segmentation validate FSCMF’s potential in high-level computer vision tasks, highlighting its broad application value. The code of FSCMF is available at <span><span>https://github.com/boshizhang123/FSCMF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"641 ","pages":"Article 130376"},"PeriodicalIF":5.5000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FSCMF: A Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network for visible and infrared image fusion\",\"authors\":\"Xiaoyang Zhang , Chengpei Xu , Guodong Fan , Zhen Hua , Jinjiang Li , Jingchun Zhou\",\"doi\":\"10.1016/j.neucom.2025.130376\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Existing image fusion methods face limitations in deep feature modeling and high-frequency information enhancement, leading to detail loss and reduced target saliency in complex scenarios. To address these issues, this paper proposes a Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network (FSCMF), which integrates local details, global context, and frequency-domain information through a dual-branch architecture to enhance multimodal feature complementarity. Specifically, FSCMF combines CNN and Transformer in a dual-branch design, where the CNN branch focuses on extracting local structures and texture details, while the Transformer branch captures long-range dependencies to improve global consistency. To further optimize feature representation, we introduce the Frequency-Spatial Adaptive Attention Module (FSAA), in which the frequency domain branch enhances high-frequency components to improve edge sharpness, while the spatial domain branch adaptively refines salient region features, ensuring a dynamic balance between global and local information. Additionally, we propose the Weighted Cross-Spectral Feature Fusion Module (WCSFF) to enhance cross-modality feature interaction through adaptive weighting, thereby improving detail integrity and semantic consistency in the fused image. A maximum frequency loss function is further incorporated to ensure the preservation of critical frequency components. Extensive experiments on three public datasets — MSRS, M<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>FD, and LLVIP — demonstrate that FSCMF outperforms existing methods in both qualitative and quantitative evaluations, producing fusion results with higher visual consistency and better information retention. Furthermore, additional experiments on object detection and semantic segmentation validate FSCMF’s potential in high-level computer vision tasks, highlighting its broad application value. The code of FSCMF is available at <span><span>https://github.com/boshizhang123/FSCMF</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"641 \",\"pages\":\"Article 130376\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225010483\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225010483","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

现有的图像融合方法在深度特征建模和高频信息增强方面存在局限性，导致复杂场景下的细节丢失和目标显著性降低。为了解决这些问题，本文提出了一种双分支频率-空间联合感知跨模态网络（FSCMF），该网络通过双分支架构集成了局部细节、全局背景和频域信息，以增强多模态特征的互补性。具体来说，FSCMF在双分支设计中结合了CNN和Transformer，其中CNN分支专注于提取局部结构和纹理细节，而Transformer分支则捕获远程依赖关系以提高全局一致性。为了进一步优化特征表示，我们引入了频率-空间自适应注意模块（FSAA），其中频域分支增强高频成分以提高边缘清晰度，而空间域分支自适应细化显著区域特征，确保全局和局部信息之间的动态平衡。此外，我们提出加权跨谱特征融合模块（Weighted Cross-Spectral Feature Fusion Module， WCSFF），通过自适应加权增强跨模态特征的交互，从而提高融合图像的细节完整性和语义一致性。最大频率损失函数进一步纳入，以确保关键频率成分的保存。在三个公共数据集（MSRS、M3FD和LLVIP）上的大量实验表明，FSCMF在定性和定量评估方面都优于现有方法，产生的融合结果具有更高的视觉一致性和更好的信息保留。此外，在目标检测和语义分割方面的实验验证了FSCMF在高级计算机视觉任务中的潜力，凸显了其广泛的应用价值。FSCMF的代码可在https://github.com/boshizhang123/FSCMF上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FSCMF: A Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network for visible and infrared image fusion

Existing image fusion methods face limitations in deep feature modeling and high-frequency information enhancement, leading to detail loss and reduced target saliency in complex scenarios. To address these issues, this paper proposes a Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network (FSCMF), which integrates local details, global context, and frequency-domain information through a dual-branch architecture to enhance multimodal feature complementarity. Specifically, FSCMF combines CNN and Transformer in a dual-branch design, where the CNN branch focuses on extracting local structures and texture details, while the Transformer branch captures long-range dependencies to improve global consistency. To further optimize feature representation, we introduce the Frequency-Spatial Adaptive Attention Module (FSAA), in which the frequency domain branch enhances high-frequency components to improve edge sharpness, while the spatial domain branch adaptively refines salient region features, ensuring a dynamic balance between global and local information. Additionally, we propose the Weighted Cross-Spectral Feature Fusion Module (WCSFF) to enhance cross-modality feature interaction through adaptive weighting, thereby improving detail integrity and semantic consistency in the fused image. A maximum frequency loss function is further incorporated to ensure the preservation of critical frequency components. Extensive experiments on three public datasets — MSRS, M

^{3}

FD, and LLVIP — demonstrate that FSCMF outperforms existing methods in both qualitative and quantitative evaluations, producing fusion results with higher visual consistency and better information retention. Furthermore, additional experiments on object detection and semantic segmentation validate FSCMF’s potential in high-level computer vision tasks, highlighting its broad application value. The code of FSCMF is available at https://github.com/boshizhang123/FSCMF.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.