一种基于扩散的多模态图像融合方法

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-05 DOI:10.1109/TIP.2025.3593775

Pengwei Liang;Junjun Jiang;Qing Ma;Chenyang Wang;Xianming Liu;Jiayi Ma

{"title":"一种基于扩散的多模态图像融合方法","authors":"Pengwei Liang;Junjun Jiang;Qing Ma;Chenyang Wang;Xianming Liu;Jiayi Ma","doi":"10.1109/TIP.2025.3593775","DOIUrl":null,"url":null,"abstract":"Infrared images exhibit a significantly different appearance compared to visible counterparts. Existing infrared and visible image fusion (IVF) methods fuse features from both infrared and visible images, producing a new “image” appearance not inherently captured by any existing device. From an appearance perspective, infrared, visible, and fused images belong to different data domains. This difference makes it challenging to apply fused images because their domain-specific appearance may be difficult for downstream systems, e.g., pre-trained segmentation models. Therefore, accurately assessing the quality of the fused image is challenging. To address those problem, we propose a novel IVF method, FusionINV, which produces fused images with an appearance similar to visible images. FusionINV employs the pre-trained Stable Diffusion (SD) model to invert infrared images into the noise feature space. To inject visible-style appearance information into the infrared features, we leverage the inverted features from visible images to guide this inversion process. In this way, we can embed all the information of infrared and visible images in the noise feature space, and then use the prior of the pre-trained SD model to generate visually friendly images that align more closely with the RGB distribution. Specially, to generate the fused image, we design a tailored fusion rule within the denoising process that iteratively fuses visible-style infrared and visible features. In this way, the fused image falls into the visible domain and can be directly applied to existing downstream machine systems. Thanks to advancements in image inversion, FusionINV can directly produce fused images in a training-free manner. Extensive experiments demonstrate that FusionINV achieves outstanding performance in both human visual evaluation and machine perception tasks. The code is available at <uri>https://github.com/erfect2020/FusionINV</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5355-5368"},"PeriodicalIF":13.7000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FusionINV: A Diffusion-Based Approach for Multimodal Image Fusion\",\"authors\":\"Pengwei Liang;Junjun Jiang;Qing Ma;Chenyang Wang;Xianming Liu;Jiayi Ma\",\"doi\":\"10.1109/TIP.2025.3593775\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Infrared images exhibit a significantly different appearance compared to visible counterparts. Existing infrared and visible image fusion (IVF) methods fuse features from both infrared and visible images, producing a new “image” appearance not inherently captured by any existing device. From an appearance perspective, infrared, visible, and fused images belong to different data domains. This difference makes it challenging to apply fused images because their domain-specific appearance may be difficult for downstream systems, e.g., pre-trained segmentation models. Therefore, accurately assessing the quality of the fused image is challenging. To address those problem, we propose a novel IVF method, FusionINV, which produces fused images with an appearance similar to visible images. FusionINV employs the pre-trained Stable Diffusion (SD) model to invert infrared images into the noise feature space. To inject visible-style appearance information into the infrared features, we leverage the inverted features from visible images to guide this inversion process. In this way, we can embed all the information of infrared and visible images in the noise feature space, and then use the prior of the pre-trained SD model to generate visually friendly images that align more closely with the RGB distribution. Specially, to generate the fused image, we design a tailored fusion rule within the denoising process that iteratively fuses visible-style infrared and visible features. In this way, the fused image falls into the visible domain and can be directly applied to existing downstream machine systems. Thanks to advancements in image inversion, FusionINV can directly produce fused images in a training-free manner. Extensive experiments demonstrate that FusionINV achieves outstanding performance in both human visual evaluation and machine perception tasks. The code is available at <uri>https://github.com/erfect2020/FusionINV</uri>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"5355-5368\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11114795/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11114795/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

红外图像与可见光图像相比，呈现出明显不同的外观。现有的红外和可见光图像融合（IVF）方法融合了红外和可见光图像的特征，产生了任何现有设备都无法捕获的新“图像”外观。从外观角度看，红外图像、可见光图像和融合图像属于不同的数据域。这种差异使得应用融合图像具有挑战性，因为它们的特定领域外观可能难以用于下游系统，例如，预训练的分割模型。因此，准确评估融合图像的质量是一个挑战。为了解决这些问题，我们提出了一种新的IVF方法，FusionINV，它产生与可见图像相似的融合图像。FusionINV利用预训练的SD （Stable Diffusion）模型将红外图像反演到噪声特征空间中。为了在红外特征中注入可见样式的外观信息，我们利用可见光图像的倒转特征来指导反演过程。这样，我们可以将红外图像和可见光图像的所有信息嵌入到噪声特征空间中，然后利用预训练的SD模型的先验，生成更接近RGB分布的视觉友好图像。为了生成融合图像，我们在去噪过程中设计了定制化的融合规则，迭代融合可见光红外和可见光特征。这样，融合后的图像就进入了可见域，可以直接应用到现有的下游机器系统中。由于图像反演技术的进步，FusionINV可以直接产生融合图像，无需训练。大量的实验表明，FusionINV在人类视觉评估和机器感知任务中都取得了出色的表现。代码可在https://github.com/erfect2020/FusionINV上获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FusionINV: A Diffusion-Based Approach for Multimodal Image Fusion

Infrared images exhibit a significantly different appearance compared to visible counterparts. Existing infrared and visible image fusion (IVF) methods fuse features from both infrared and visible images, producing a new “image” appearance not inherently captured by any existing device. From an appearance perspective, infrared, visible, and fused images belong to different data domains. This difference makes it challenging to apply fused images because their domain-specific appearance may be difficult for downstream systems, e.g., pre-trained segmentation models. Therefore, accurately assessing the quality of the fused image is challenging. To address those problem, we propose a novel IVF method, FusionINV, which produces fused images with an appearance similar to visible images. FusionINV employs the pre-trained Stable Diffusion (SD) model to invert infrared images into the noise feature space. To inject visible-style appearance information into the infrared features, we leverage the inverted features from visible images to guide this inversion process. In this way, we can embed all the information of infrared and visible images in the noise feature space, and then use the prior of the pre-trained SD model to generate visually friendly images that align more closely with the RGB distribution. Specially, to generate the fused image, we design a tailored fusion rule within the denoising process that iteratively fuses visible-style infrared and visible features. In this way, the fused image falls into the visible domain and can be directly applied to existing downstream machine systems. Thanks to advancements in image inversion, FusionINV can directly produce fused images in a training-free manner. Extensive experiments demonstrate that FusionINV achieves outstanding performance in both human visual evaluation and machine perception tasks. The code is available at https://github.com/erfect2020/FusionINV

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量