A versatile interactive dual-path architecture with text modulation for multi-band image fusion

IF 3 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Digital Signal Processing Pub Date : 2025-09-10 DOI:10.1016/j.dsp.2025.105569

Jinmei Zhang , Juntao Huang , Jiangpeng Du , Chao Zhang , Mengdi Li

{"title":"A versatile interactive dual-path architecture with text modulation for multi-band image fusion","authors":"Jinmei Zhang , Juntao Huang , Jiangpeng Du , Chao Zhang , Mengdi Li","doi":"10.1016/j.dsp.2025.105569","DOIUrl":null,"url":null,"abstract":"<div><div>Image fusion aims to integrate complementary information from source images to generate images with more comprehensive detail representation. Compared to conventional single-band images, multi-band images provide a wider array of radiative details and information. With the rapid advancement of deep learning techniques, the fusion of multi-band images can provide enhanced feature representations for target detection, demonstrating significant application value across military, medical, and environmental monitoring domains. Despite its growing popularity, image fusion remains a challenging problem due to inherent discrepancies in how different sources depict scene content. Current methods generally suffer from three limitations: 1) Sensitivity to slight misalignment between source images, leading to artifact generation in fused results; 2) Ineffective handling of interference caused by low quality source images, such as noise or degradation, etc. And 3) lack of interactive mechanisms to accommodate diverse subjective and objective requirements. To address these challenges, we propose a versatile interactive dual-path architecture with text modulation for multi-band image fusion. First, a micro registration deformation dual-path fusion module is designed, which employs explicit deformation fields to compensate for geometric misalignment, thereby mitigating artifacts, while incorporating a feature adaptive selection mechanism to enhance texture details and contrast. Second, we proposed a dynamic text modulated fusion module utilizing a dual-path attention mechanism, where text embeddings serve as conditional signals to drive both channel-wise and spatial attention weight generation, simultaneously addressing image quality degradation and interactive flexible fusion requirements. Extensive experiments conducted on two public benchmark datasets and one self-constructed multi-band infrared dataset prove that our method is superior to state-of-the-art (SOTA) methods in terms of quantitative evaluations and qualitative evaluations, effectively enhancing image fusion performance and degradation treatment.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105569"},"PeriodicalIF":3.0000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425005913","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Image fusion aims to integrate complementary information from source images to generate images with more comprehensive detail representation. Compared to conventional single-band images, multi-band images provide a wider array of radiative details and information. With the rapid advancement of deep learning techniques, the fusion of multi-band images can provide enhanced feature representations for target detection, demonstrating significant application value across military, medical, and environmental monitoring domains. Despite its growing popularity, image fusion remains a challenging problem due to inherent discrepancies in how different sources depict scene content. Current methods generally suffer from three limitations: 1) Sensitivity to slight misalignment between source images, leading to artifact generation in fused results; 2) Ineffective handling of interference caused by low quality source images, such as noise or degradation, etc. And 3) lack of interactive mechanisms to accommodate diverse subjective and objective requirements. To address these challenges, we propose a versatile interactive dual-path architecture with text modulation for multi-band image fusion. First, a micro registration deformation dual-path fusion module is designed, which employs explicit deformation fields to compensate for geometric misalignment, thereby mitigating artifacts, while incorporating a feature adaptive selection mechanism to enhance texture details and contrast. Second, we proposed a dynamic text modulated fusion module utilizing a dual-path attention mechanism, where text embeddings serve as conditional signals to drive both channel-wise and spatial attention weight generation, simultaneously addressing image quality degradation and interactive flexible fusion requirements. Extensive experiments conducted on two public benchmark datasets and one self-constructed multi-band infrared dataset prove that our method is superior to state-of-the-art (SOTA) methods in terms of quantitative evaluations and qualitative evaluations, effectively enhancing image fusion performance and degradation treatment.

查看原文本刊更多论文

基于文本调制的多波段图像融合多用途交互式双路径架构

图像融合的目的是将源图像中的互补信息进行融合，生成具有更全面细节表达的图像。与传统的单波段图像相比，多波段图像提供了更广泛的辐射细节和信息。随着深度学习技术的快速发展，多波段图像融合可以为目标检测提供增强的特征表示，在军事、医疗和环境监测领域具有重要的应用价值。尽管图像融合越来越受欢迎，但由于不同来源描述场景内容的固有差异，图像融合仍然是一个具有挑战性的问题。目前的方法普遍存在三个局限性：1)对源图像之间的轻微偏差敏感，导致融合结果产生伪影；2)对低质量源图像造成的干扰处理不力，如噪声或退化等。3)缺乏适应不同主客观需求的互动机制。为了解决这些挑战，我们提出了一种多波段图像融合的多功能交互式双路架构，其中包含文本调制。首先，设计了微配准变形双路径融合模块，该模块采用显式变形场来补偿几何错位，从而减轻伪像，同时结合特征自适应选择机制来增强纹理细节和对比度。其次，我们提出了一个利用双路径注意机制的动态文本调制融合模块，其中文本嵌入作为条件信号来驱动信道和空间注意权重的生成，同时解决图像质量下降和交互式灵活融合需求。在两个公开的基准数据集和一个自构建的多波段红外数据集上进行的大量实验证明，我们的方法在定量评价和定性评价方面都优于最先进的（SOTA）方法，有效地提高了图像融合性能和退化处理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Signal Processing 工程技术-工程：电子与电气

CiteScore

5.30

自引率

17.20%

发文量

435

审稿时长

66 days

期刊介绍： Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,