一种基于扩散的多模态图像融合方法

IF 13.7
Pengwei Liang;Junjun Jiang;Qing Ma;Chenyang Wang;Xianming Liu;Jiayi Ma
{"title":"一种基于扩散的多模态图像融合方法","authors":"Pengwei Liang;Junjun Jiang;Qing Ma;Chenyang Wang;Xianming Liu;Jiayi Ma","doi":"10.1109/TIP.2025.3593775","DOIUrl":null,"url":null,"abstract":"Infrared images exhibit a significantly different appearance compared to visible counterparts. Existing infrared and visible image fusion (IVF) methods fuse features from both infrared and visible images, producing a new “image” appearance not inherently captured by any existing device. From an appearance perspective, infrared, visible, and fused images belong to different data domains. This difference makes it challenging to apply fused images because their domain-specific appearance may be difficult for downstream systems, e.g., pre-trained segmentation models. Therefore, accurately assessing the quality of the fused image is challenging. To address those problem, we propose a novel IVF method, FusionINV, which produces fused images with an appearance similar to visible images. FusionINV employs the pre-trained Stable Diffusion (SD) model to invert infrared images into the noise feature space. To inject visible-style appearance information into the infrared features, we leverage the inverted features from visible images to guide this inversion process. In this way, we can embed all the information of infrared and visible images in the noise feature space, and then use the prior of the pre-trained SD model to generate visually friendly images that align more closely with the RGB distribution. Specially, to generate the fused image, we design a tailored fusion rule within the denoising process that iteratively fuses visible-style infrared and visible features. In this way, the fused image falls into the visible domain and can be directly applied to existing downstream machine systems. Thanks to advancements in image inversion, FusionINV can directly produce fused images in a training-free manner. Extensive experiments demonstrate that FusionINV achieves outstanding performance in both human visual evaluation and machine perception tasks. The code is available at <uri>https://github.com/erfect2020/FusionINV</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5355-5368"},"PeriodicalIF":13.7000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FusionINV: A Diffusion-Based Approach for Multimodal Image Fusion\",\"authors\":\"Pengwei Liang;Junjun Jiang;Qing Ma;Chenyang Wang;Xianming Liu;Jiayi Ma\",\"doi\":\"10.1109/TIP.2025.3593775\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Infrared images exhibit a significantly different appearance compared to visible counterparts. Existing infrared and visible image fusion (IVF) methods fuse features from both infrared and visible images, producing a new “image” appearance not inherently captured by any existing device. From an appearance perspective, infrared, visible, and fused images belong to different data domains. This difference makes it challenging to apply fused images because their domain-specific appearance may be difficult for downstream systems, e.g., pre-trained segmentation models. Therefore, accurately assessing the quality of the fused image is challenging. To address those problem, we propose a novel IVF method, FusionINV, which produces fused images with an appearance similar to visible images. FusionINV employs the pre-trained Stable Diffusion (SD) model to invert infrared images into the noise feature space. To inject visible-style appearance information into the infrared features, we leverage the inverted features from visible images to guide this inversion process. In this way, we can embed all the information of infrared and visible images in the noise feature space, and then use the prior of the pre-trained SD model to generate visually friendly images that align more closely with the RGB distribution. Specially, to generate the fused image, we design a tailored fusion rule within the denoising process that iteratively fuses visible-style infrared and visible features. In this way, the fused image falls into the visible domain and can be directly applied to existing downstream machine systems. Thanks to advancements in image inversion, FusionINV can directly produce fused images in a training-free manner. Extensive experiments demonstrate that FusionINV achieves outstanding performance in both human visual evaluation and machine perception tasks. The code is available at <uri>https://github.com/erfect2020/FusionINV</uri>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"5355-5368\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11114795/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11114795/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

红外图像与可见光图像相比,呈现出明显不同的外观。现有的红外和可见光图像融合(IVF)方法融合了红外和可见光图像的特征,产生了任何现有设备都无法捕获的新“图像”外观。从外观角度看,红外图像、可见光图像和融合图像属于不同的数据域。这种差异使得应用融合图像具有挑战性,因为它们的特定领域外观可能难以用于下游系统,例如,预训练的分割模型。因此,准确评估融合图像的质量是一个挑战。为了解决这些问题,我们提出了一种新的IVF方法,FusionINV,它产生与可见图像相似的融合图像。FusionINV利用预训练的SD (Stable Diffusion)模型将红外图像反演到噪声特征空间中。为了在红外特征中注入可见样式的外观信息,我们利用可见光图像的倒转特征来指导反演过程。这样,我们可以将红外图像和可见光图像的所有信息嵌入到噪声特征空间中,然后利用预训练的SD模型的先验,生成更接近RGB分布的视觉友好图像。为了生成融合图像,我们在去噪过程中设计了定制化的融合规则,迭代融合可见光红外和可见光特征。这样,融合后的图像就进入了可见域,可以直接应用到现有的下游机器系统中。由于图像反演技术的进步,FusionINV可以直接产生融合图像,无需训练。大量的实验表明,FusionINV在人类视觉评估和机器感知任务中都取得了出色的表现。代码可在https://github.com/erfect2020/FusionINV上获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
FusionINV: A Diffusion-Based Approach for Multimodal Image Fusion
Infrared images exhibit a significantly different appearance compared to visible counterparts. Existing infrared and visible image fusion (IVF) methods fuse features from both infrared and visible images, producing a new “image” appearance not inherently captured by any existing device. From an appearance perspective, infrared, visible, and fused images belong to different data domains. This difference makes it challenging to apply fused images because their domain-specific appearance may be difficult for downstream systems, e.g., pre-trained segmentation models. Therefore, accurately assessing the quality of the fused image is challenging. To address those problem, we propose a novel IVF method, FusionINV, which produces fused images with an appearance similar to visible images. FusionINV employs the pre-trained Stable Diffusion (SD) model to invert infrared images into the noise feature space. To inject visible-style appearance information into the infrared features, we leverage the inverted features from visible images to guide this inversion process. In this way, we can embed all the information of infrared and visible images in the noise feature space, and then use the prior of the pre-trained SD model to generate visually friendly images that align more closely with the RGB distribution. Specially, to generate the fused image, we design a tailored fusion rule within the denoising process that iteratively fuses visible-style infrared and visible features. In this way, the fused image falls into the visible domain and can be directly applied to existing downstream machine systems. Thanks to advancements in image inversion, FusionINV can directly produce fused images in a training-free manner. Extensive experiments demonstrate that FusionINV achieves outstanding performance in both human visual evaluation and machine perception tasks. The code is available at https://github.com/erfect2020/FusionINV
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信