Xiaoyang Zhang , Chengpei Xu , Guodong Fan , Zhen Hua , Jinjiang Li , Jingchun Zhou
{"title":"FSCMF:一种用于可见光和红外图像融合的双分支频率-空间联合感知跨模态网络","authors":"Xiaoyang Zhang , Chengpei Xu , Guodong Fan , Zhen Hua , Jinjiang Li , Jingchun Zhou","doi":"10.1016/j.neucom.2025.130376","DOIUrl":null,"url":null,"abstract":"<div><div>Existing image fusion methods face limitations in deep feature modeling and high-frequency information enhancement, leading to detail loss and reduced target saliency in complex scenarios. To address these issues, this paper proposes a Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network (FSCMF), which integrates local details, global context, and frequency-domain information through a dual-branch architecture to enhance multimodal feature complementarity. Specifically, FSCMF combines CNN and Transformer in a dual-branch design, where the CNN branch focuses on extracting local structures and texture details, while the Transformer branch captures long-range dependencies to improve global consistency. To further optimize feature representation, we introduce the Frequency-Spatial Adaptive Attention Module (FSAA), in which the frequency domain branch enhances high-frequency components to improve edge sharpness, while the spatial domain branch adaptively refines salient region features, ensuring a dynamic balance between global and local information. Additionally, we propose the Weighted Cross-Spectral Feature Fusion Module (WCSFF) to enhance cross-modality feature interaction through adaptive weighting, thereby improving detail integrity and semantic consistency in the fused image. A maximum frequency loss function is further incorporated to ensure the preservation of critical frequency components. Extensive experiments on three public datasets — MSRS, M<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>FD, and LLVIP — demonstrate that FSCMF outperforms existing methods in both qualitative and quantitative evaluations, producing fusion results with higher visual consistency and better information retention. Furthermore, additional experiments on object detection and semantic segmentation validate FSCMF’s potential in high-level computer vision tasks, highlighting its broad application value. The code of FSCMF is available at <span><span>https://github.com/boshizhang123/FSCMF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"641 ","pages":"Article 130376"},"PeriodicalIF":5.5000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FSCMF: A Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network for visible and infrared image fusion\",\"authors\":\"Xiaoyang Zhang , Chengpei Xu , Guodong Fan , Zhen Hua , Jinjiang Li , Jingchun Zhou\",\"doi\":\"10.1016/j.neucom.2025.130376\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Existing image fusion methods face limitations in deep feature modeling and high-frequency information enhancement, leading to detail loss and reduced target saliency in complex scenarios. To address these issues, this paper proposes a Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network (FSCMF), which integrates local details, global context, and frequency-domain information through a dual-branch architecture to enhance multimodal feature complementarity. Specifically, FSCMF combines CNN and Transformer in a dual-branch design, where the CNN branch focuses on extracting local structures and texture details, while the Transformer branch captures long-range dependencies to improve global consistency. To further optimize feature representation, we introduce the Frequency-Spatial Adaptive Attention Module (FSAA), in which the frequency domain branch enhances high-frequency components to improve edge sharpness, while the spatial domain branch adaptively refines salient region features, ensuring a dynamic balance between global and local information. Additionally, we propose the Weighted Cross-Spectral Feature Fusion Module (WCSFF) to enhance cross-modality feature interaction through adaptive weighting, thereby improving detail integrity and semantic consistency in the fused image. A maximum frequency loss function is further incorporated to ensure the preservation of critical frequency components. Extensive experiments on three public datasets — MSRS, M<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>FD, and LLVIP — demonstrate that FSCMF outperforms existing methods in both qualitative and quantitative evaluations, producing fusion results with higher visual consistency and better information retention. Furthermore, additional experiments on object detection and semantic segmentation validate FSCMF’s potential in high-level computer vision tasks, highlighting its broad application value. The code of FSCMF is available at <span><span>https://github.com/boshizhang123/FSCMF</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"641 \",\"pages\":\"Article 130376\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225010483\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225010483","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
FSCMF: A Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network for visible and infrared image fusion
Existing image fusion methods face limitations in deep feature modeling and high-frequency information enhancement, leading to detail loss and reduced target saliency in complex scenarios. To address these issues, this paper proposes a Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network (FSCMF), which integrates local details, global context, and frequency-domain information through a dual-branch architecture to enhance multimodal feature complementarity. Specifically, FSCMF combines CNN and Transformer in a dual-branch design, where the CNN branch focuses on extracting local structures and texture details, while the Transformer branch captures long-range dependencies to improve global consistency. To further optimize feature representation, we introduce the Frequency-Spatial Adaptive Attention Module (FSAA), in which the frequency domain branch enhances high-frequency components to improve edge sharpness, while the spatial domain branch adaptively refines salient region features, ensuring a dynamic balance between global and local information. Additionally, we propose the Weighted Cross-Spectral Feature Fusion Module (WCSFF) to enhance cross-modality feature interaction through adaptive weighting, thereby improving detail integrity and semantic consistency in the fused image. A maximum frequency loss function is further incorporated to ensure the preservation of critical frequency components. Extensive experiments on three public datasets — MSRS, MFD, and LLVIP — demonstrate that FSCMF outperforms existing methods in both qualitative and quantitative evaluations, producing fusion results with higher visual consistency and better information retention. Furthermore, additional experiments on object detection and semantic segmentation validate FSCMF’s potential in high-level computer vision tasks, highlighting its broad application value. The code of FSCMF is available at https://github.com/boshizhang123/FSCMF.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.