{"title":"CMEFusion:傅立叶变换和可见光图像的跨模态增强与融合","authors":"Xi Tong;Xing Luo;Jiangxin Yang;Yanpeng Cao","doi":"10.1109/TCI.2024.3436716","DOIUrl":null,"url":null,"abstract":"The fusion of far infrared (FIR) and visible images aims to generate a high-quality composite image that contains salient structures and abundant texture details for human visual perception. However, the existing fusion methods typically fall short of utilizing complementary source image characteristics to boost the features extracted from degraded visible or FIR images, thus they cannot generate satisfactory fusion results in adverse lighting or weather conditions. In this paper, we propose a novel Cross-Modal multispectral image Enhancement and Fusion framework (CMEFusion), which adaptively enhances both FIR and visible inputs by leveraging complementary cross-modal features to further facilitate multispectral feature aggregation. Specifically, we first present a new cross-modal image enhancement sub-network (CMIENet), which is built on a CNN-Transformer hybrid architecture to perform the complementary exchange of local-salient and global-contextual features extracted from FIR and visible modalities, respectively. Then, we design a gradient-content differential fusion sub-network (GCDFNet) to progressively integrate decoupled gradient and content information via modified central difference convolution. Finally, we present a comprehensive joint enhancement-fusion multi-term loss function to drive the model to narrow the optimization gap between the above-mentioned two sub-networks based on the self-supervised aspects of exposure, color, structure, and intensity. In this manner, the proposed CMEFusion model facilitates better-performing visible and FIR image fusion in an end-to-end way, achieving enhanced visual quality with more natural and realistic appearances. Extensive experiments validate that CMEFusion surpasses state-of-the-art image fusion algorithms, as evidenced by superior performance in both visual quality and quantitative evaluations.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"1331-1345"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CMEFusion: Cross-Modal Enhancement and Fusion of FIR and Visible Images\",\"authors\":\"Xi Tong;Xing Luo;Jiangxin Yang;Yanpeng Cao\",\"doi\":\"10.1109/TCI.2024.3436716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The fusion of far infrared (FIR) and visible images aims to generate a high-quality composite image that contains salient structures and abundant texture details for human visual perception. However, the existing fusion methods typically fall short of utilizing complementary source image characteristics to boost the features extracted from degraded visible or FIR images, thus they cannot generate satisfactory fusion results in adverse lighting or weather conditions. In this paper, we propose a novel Cross-Modal multispectral image Enhancement and Fusion framework (CMEFusion), which adaptively enhances both FIR and visible inputs by leveraging complementary cross-modal features to further facilitate multispectral feature aggregation. Specifically, we first present a new cross-modal image enhancement sub-network (CMIENet), which is built on a CNN-Transformer hybrid architecture to perform the complementary exchange of local-salient and global-contextual features extracted from FIR and visible modalities, respectively. Then, we design a gradient-content differential fusion sub-network (GCDFNet) to progressively integrate decoupled gradient and content information via modified central difference convolution. Finally, we present a comprehensive joint enhancement-fusion multi-term loss function to drive the model to narrow the optimization gap between the above-mentioned two sub-networks based on the self-supervised aspects of exposure, color, structure, and intensity. In this manner, the proposed CMEFusion model facilitates better-performing visible and FIR image fusion in an end-to-end way, achieving enhanced visual quality with more natural and realistic appearances. Extensive experiments validate that CMEFusion surpasses state-of-the-art image fusion algorithms, as evidenced by superior performance in both visual quality and quantitative evaluations.\",\"PeriodicalId\":56022,\"journal\":{\"name\":\"IEEE Transactions on Computational Imaging\",\"volume\":\"10 \",\"pages\":\"1331-1345\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10620627/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Imaging","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10620627/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
远红外(FIR)与可见光图像的融合旨在生成高质量的复合图像,其中包含突出的结构和丰富的纹理细节,以满足人类的视觉感知。然而,现有的融合方法通常无法利用互补的源图像特征来增强从降级的可见光或远红外图像中提取的特征,因此无法在不利的光照或天气条件下生成令人满意的融合结果。在本文中,我们提出了一种新颖的跨模态多光谱图像增强与融合框架(CMEFusion),通过利用互补的跨模态特征,自适应地增强 FIR 和可见光输入,从而进一步促进多光谱特征聚合。具体来说,我们首先提出了一种新的跨模态图像增强子网络(CMIENet),它建立在 CNN-Transformer 混合架构上,可分别对从 FIR 和可见光模态提取的局部特征和全局上下文特征进行互补交换。然后,我们设计了一个梯度-内容差分融合子网络(GCDFNet),通过修正的中心差分卷积逐步整合解耦梯度和内容信息。最后,我们提出了一个全面的联合增强-融合多项损失函数,以驱动模型缩小上述两个子网络之间基于曝光、颜色、结构和强度等自监督方面的优化差距。通过这种方式,所提出的 CMEFusion 模型有助于以端到端的方式实现性能更佳的可见光和红外图像融合,从而获得更自然、更逼真的视觉质量。广泛的实验验证了 CMEFusion 超越了最先进的图像融合算法,其在视觉质量和定量评估方面的卓越表现就是证明。
CMEFusion: Cross-Modal Enhancement and Fusion of FIR and Visible Images
The fusion of far infrared (FIR) and visible images aims to generate a high-quality composite image that contains salient structures and abundant texture details for human visual perception. However, the existing fusion methods typically fall short of utilizing complementary source image characteristics to boost the features extracted from degraded visible or FIR images, thus they cannot generate satisfactory fusion results in adverse lighting or weather conditions. In this paper, we propose a novel Cross-Modal multispectral image Enhancement and Fusion framework (CMEFusion), which adaptively enhances both FIR and visible inputs by leveraging complementary cross-modal features to further facilitate multispectral feature aggregation. Specifically, we first present a new cross-modal image enhancement sub-network (CMIENet), which is built on a CNN-Transformer hybrid architecture to perform the complementary exchange of local-salient and global-contextual features extracted from FIR and visible modalities, respectively. Then, we design a gradient-content differential fusion sub-network (GCDFNet) to progressively integrate decoupled gradient and content information via modified central difference convolution. Finally, we present a comprehensive joint enhancement-fusion multi-term loss function to drive the model to narrow the optimization gap between the above-mentioned two sub-networks based on the self-supervised aspects of exposure, color, structure, and intensity. In this manner, the proposed CMEFusion model facilitates better-performing visible and FIR image fusion in an end-to-end way, achieving enhanced visual quality with more natural and realistic appearances. Extensive experiments validate that CMEFusion surpasses state-of-the-art image fusion algorithms, as evidenced by superior performance in both visual quality and quantitative evaluations.
期刊介绍:
The IEEE Transactions on Computational Imaging will publish articles where computation plays an integral role in the image formation process. Papers will cover all areas of computational imaging ranging from fundamental theoretical methods to the latest innovative computational imaging system designs. Topics of interest will include advanced algorithms and mathematical techniques, model-based data inversion, methods for image and signal recovery from sparse and incomplete data, techniques for non-traditional sensing of image data, methods for dynamic information acquisition and extraction from imaging sensors, software and hardware for efficient computation in imaging systems, and highly novel imaging system design.