DSAFuse: Infrared and visible image fusion via dual-branch spatial adaptive feature extraction

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-11-20 DOI:10.1016/j.neucom.2024.128957

Shixian Shen , Yong Feng , Nianbo Liu , Ming Liu , Yingna Li

{"title":"DSAFuse: Infrared and visible image fusion via dual-branch spatial adaptive feature extraction","authors":"Shixian Shen , Yong Feng , Nianbo Liu , Ming Liu , Yingna Li","doi":"10.1016/j.neucom.2024.128957","DOIUrl":null,"url":null,"abstract":"<div><div>By exploiting the thermal radiation information from infrared images and the detailed texture information from visible light images, image fusion technology enables more accurate target identification. However, most current image fusion methods primarily rely on convolutional neural networks for cross-modal local feature extraction and do not fully utilize long-range contextual information, resulting in limited performance in complex scenarios. To address this issue, this paper proposes an infrared and visible light image fusion method termed DSAFuse, which is based on dual-branch spatially adaptive feature extraction. Specifically, a unimodal feature mixing module is used for multi-scale spatially adaptive feature extraction on both modal images with shared weights. The extracted features are then inputted into a dual-branch feature extraction module comprising flatten transformer blocks and vanilla blocks, which extract low-frequency texture features and high-frequency local detail features, respectively. Subsequently, features from both modalities are concatenated, and a bimodal feature mixing module reconstructs the fused image to generate semantically rich fusion results. Additionally, to achieve end-to-end unsupervised training, a loss function consisting of decomposition loss, gradient loss, and structural similarity loss is designed. Qualitative and quantitative experimental results demonstrate that our DSAFuse outperforms the state-of-the-art IVIF methods across various benchmark datasets. It effectively preserves the texture details and target features of the source images, producing satisfactory fusion results even in harsh environments and enhancing downstream visual tasks.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128957"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224017284","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

By exploiting the thermal radiation information from infrared images and the detailed texture information from visible light images, image fusion technology enables more accurate target identification. However, most current image fusion methods primarily rely on convolutional neural networks for cross-modal local feature extraction and do not fully utilize long-range contextual information, resulting in limited performance in complex scenarios. To address this issue, this paper proposes an infrared and visible light image fusion method termed DSAFuse, which is based on dual-branch spatially adaptive feature extraction. Specifically, a unimodal feature mixing module is used for multi-scale spatially adaptive feature extraction on both modal images with shared weights. The extracted features are then inputted into a dual-branch feature extraction module comprising flatten transformer blocks and vanilla blocks, which extract low-frequency texture features and high-frequency local detail features, respectively. Subsequently, features from both modalities are concatenated, and a bimodal feature mixing module reconstructs the fused image to generate semantically rich fusion results. Additionally, to achieve end-to-end unsupervised training, a loss function consisting of decomposition loss, gradient loss, and structural similarity loss is designed. Qualitative and quantitative experimental results demonstrate that our DSAFuse outperforms the state-of-the-art IVIF methods across various benchmark datasets. It effectively preserves the texture details and target features of the source images, producing satisfactory fusion results even in harsh environments and enhancing downstream visual tasks.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.