{"title":"CMEFusion: Cross-Modal Enhancement and Fusion of FIR and Visible Images","authors":"Xi Tong;Xing Luo;Jiangxin Yang;Yanpeng Cao","doi":"10.1109/TCI.2024.3436716","DOIUrl":null,"url":null,"abstract":"The fusion of far infrared (FIR) and visible images aims to generate a high-quality composite image that contains salient structures and abundant texture details for human visual perception. However, the existing fusion methods typically fall short of utilizing complementary source image characteristics to boost the features extracted from degraded visible or FIR images, thus they cannot generate satisfactory fusion results in adverse lighting or weather conditions. In this paper, we propose a novel Cross-Modal multispectral image Enhancement and Fusion framework (CMEFusion), which adaptively enhances both FIR and visible inputs by leveraging complementary cross-modal features to further facilitate multispectral feature aggregation. Specifically, we first present a new cross-modal image enhancement sub-network (CMIENet), which is built on a CNN-Transformer hybrid architecture to perform the complementary exchange of local-salient and global-contextual features extracted from FIR and visible modalities, respectively. Then, we design a gradient-content differential fusion sub-network (GCDFNet) to progressively integrate decoupled gradient and content information via modified central difference convolution. Finally, we present a comprehensive joint enhancement-fusion multi-term loss function to drive the model to narrow the optimization gap between the above-mentioned two sub-networks based on the self-supervised aspects of exposure, color, structure, and intensity. In this manner, the proposed CMEFusion model facilitates better-performing visible and FIR image fusion in an end-to-end way, achieving enhanced visual quality with more natural and realistic appearances. Extensive experiments validate that CMEFusion surpasses state-of-the-art image fusion algorithms, as evidenced by superior performance in both visual quality and quantitative evaluations.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"1331-1345"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Imaging","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10620627/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The fusion of far infrared (FIR) and visible images aims to generate a high-quality composite image that contains salient structures and abundant texture details for human visual perception. However, the existing fusion methods typically fall short of utilizing complementary source image characteristics to boost the features extracted from degraded visible or FIR images, thus they cannot generate satisfactory fusion results in adverse lighting or weather conditions. In this paper, we propose a novel Cross-Modal multispectral image Enhancement and Fusion framework (CMEFusion), which adaptively enhances both FIR and visible inputs by leveraging complementary cross-modal features to further facilitate multispectral feature aggregation. Specifically, we first present a new cross-modal image enhancement sub-network (CMIENet), which is built on a CNN-Transformer hybrid architecture to perform the complementary exchange of local-salient and global-contextual features extracted from FIR and visible modalities, respectively. Then, we design a gradient-content differential fusion sub-network (GCDFNet) to progressively integrate decoupled gradient and content information via modified central difference convolution. Finally, we present a comprehensive joint enhancement-fusion multi-term loss function to drive the model to narrow the optimization gap between the above-mentioned two sub-networks based on the self-supervised aspects of exposure, color, structure, and intensity. In this manner, the proposed CMEFusion model facilitates better-performing visible and FIR image fusion in an end-to-end way, achieving enhanced visual quality with more natural and realistic appearances. Extensive experiments validate that CMEFusion surpasses state-of-the-art image fusion algorithms, as evidenced by superior performance in both visual quality and quantitative evaluations.
期刊介绍:
The IEEE Transactions on Computational Imaging will publish articles where computation plays an integral role in the image formation process. Papers will cover all areas of computational imaging ranging from fundamental theoretical methods to the latest innovative computational imaging system designs. Topics of interest will include advanced algorithms and mathematical techniques, model-based data inversion, methods for image and signal recovery from sparse and incomplete data, techniques for non-traditional sensing of image data, methods for dynamic information acquisition and extraction from imaging sensors, software and hardware for efficient computation in imaging systems, and highly novel imaging system design.