Jinmei Zhang , Juntao Huang , Jiangpeng Du , Chao Zhang , Mengdi Li
{"title":"A versatile interactive dual-path architecture with text modulation for multi-band image fusion","authors":"Jinmei Zhang , Juntao Huang , Jiangpeng Du , Chao Zhang , Mengdi Li","doi":"10.1016/j.dsp.2025.105569","DOIUrl":null,"url":null,"abstract":"<div><div>Image fusion aims to integrate complementary information from source images to generate images with more comprehensive detail representation. Compared to conventional single-band images, multi-band images provide a wider array of radiative details and information. With the rapid advancement of deep learning techniques, the fusion of multi-band images can provide enhanced feature representations for target detection, demonstrating significant application value across military, medical, and environmental monitoring domains. Despite its growing popularity, image fusion remains a challenging problem due to inherent discrepancies in how different sources depict scene content. Current methods generally suffer from three limitations: 1) Sensitivity to slight misalignment between source images, leading to artifact generation in fused results; 2) Ineffective handling of interference caused by low quality source images, such as noise or degradation, etc. And 3) lack of interactive mechanisms to accommodate diverse subjective and objective requirements. To address these challenges, we propose a versatile interactive dual-path architecture with text modulation for multi-band image fusion. First, a micro registration deformation dual-path fusion module is designed, which employs explicit deformation fields to compensate for geometric misalignment, thereby mitigating artifacts, while incorporating a feature adaptive selection mechanism to enhance texture details and contrast. Second, we proposed a dynamic text modulated fusion module utilizing a dual-path attention mechanism, where text embeddings serve as conditional signals to drive both channel-wise and spatial attention weight generation, simultaneously addressing image quality degradation and interactive flexible fusion requirements. Extensive experiments conducted on two public benchmark datasets and one self-constructed multi-band infrared dataset prove that our method is superior to state-of-the-art (SOTA) methods in terms of quantitative evaluations and qualitative evaluations, effectively enhancing image fusion performance and degradation treatment.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105569"},"PeriodicalIF":3.0000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425005913","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Image fusion aims to integrate complementary information from source images to generate images with more comprehensive detail representation. Compared to conventional single-band images, multi-band images provide a wider array of radiative details and information. With the rapid advancement of deep learning techniques, the fusion of multi-band images can provide enhanced feature representations for target detection, demonstrating significant application value across military, medical, and environmental monitoring domains. Despite its growing popularity, image fusion remains a challenging problem due to inherent discrepancies in how different sources depict scene content. Current methods generally suffer from three limitations: 1) Sensitivity to slight misalignment between source images, leading to artifact generation in fused results; 2) Ineffective handling of interference caused by low quality source images, such as noise or degradation, etc. And 3) lack of interactive mechanisms to accommodate diverse subjective and objective requirements. To address these challenges, we propose a versatile interactive dual-path architecture with text modulation for multi-band image fusion. First, a micro registration deformation dual-path fusion module is designed, which employs explicit deformation fields to compensate for geometric misalignment, thereby mitigating artifacts, while incorporating a feature adaptive selection mechanism to enhance texture details and contrast. Second, we proposed a dynamic text modulated fusion module utilizing a dual-path attention mechanism, where text embeddings serve as conditional signals to drive both channel-wise and spatial attention weight generation, simultaneously addressing image quality degradation and interactive flexible fusion requirements. Extensive experiments conducted on two public benchmark datasets and one self-constructed multi-band infrared dataset prove that our method is superior to state-of-the-art (SOTA) methods in terms of quantitative evaluations and qualitative evaluations, effectively enhancing image fusion performance and degradation treatment.
期刊介绍:
Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal.
The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as:
• big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,