{"title":"F2Fusion: Frequency Feature Fusion Network for Infrared and Visible Image via Contourlet Transform and Mamba-UNet","authors":"Renhe Liu;Han Wang;Kai Hu;Shaochu Wang;Yu Liu","doi":"10.1109/TIM.2025.3580829","DOIUrl":null,"url":null,"abstract":"To integrate complementary thermal and texture information from source infrared (IR) and visible (VIS) images into a comprehensive fused image, traditional multiscale transform algorithms, and deep neural networks have been extensively explored for IR and VIS image fusion (IVIF). However, existing methods often face difficulties combining the strengths of these two approaches, particularly when it comes to balancing the preservation of salient and texture information in challenging conditions such as low light, glare, and overexposure. This article proposes a novel frequency feature fusion network (F2Fusion) that exploits detailed space-frequency transformation through contourlet transform (CT) and multiscale long-range learning via the Mamba-UNet architecture. The Mamba block is embedded into the multiscale encoder and decoder structures to improve feature extraction and image reconstruction performance. The CT operation replaces the conventional pooling layer in the multiscale encoder, converting spatial features into high- and low-frequency subbands. We then introduce a dual-branch frequency feature fusion module to facilitate the fusion of cross-modality illumination information and fine details based on the distinct characteristics of different frequency subbands. In addition, we design a composite loss function, which includes both gradient and salient constraints, to guide the precise synthesis of salient targets and texture regions. Qualitative and quantitative comparisons across three benchmark datasets demonstrate that the proposed method outperforms recent state-of-the-art (SOTA) fusion techniques. Extended experimental results on downstream object detection tasks further validate the distinct advantages of the proposed architecture for fusion through precise frequency decomposition. The code is available at: <uri>https://github.com/lrh-1994/F2Fusion</uri>","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-17"},"PeriodicalIF":5.9000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11042881/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
To integrate complementary thermal and texture information from source infrared (IR) and visible (VIS) images into a comprehensive fused image, traditional multiscale transform algorithms, and deep neural networks have been extensively explored for IR and VIS image fusion (IVIF). However, existing methods often face difficulties combining the strengths of these two approaches, particularly when it comes to balancing the preservation of salient and texture information in challenging conditions such as low light, glare, and overexposure. This article proposes a novel frequency feature fusion network (F2Fusion) that exploits detailed space-frequency transformation through contourlet transform (CT) and multiscale long-range learning via the Mamba-UNet architecture. The Mamba block is embedded into the multiscale encoder and decoder structures to improve feature extraction and image reconstruction performance. The CT operation replaces the conventional pooling layer in the multiscale encoder, converting spatial features into high- and low-frequency subbands. We then introduce a dual-branch frequency feature fusion module to facilitate the fusion of cross-modality illumination information and fine details based on the distinct characteristics of different frequency subbands. In addition, we design a composite loss function, which includes both gradient and salient constraints, to guide the precise synthesis of salient targets and texture regions. Qualitative and quantitative comparisons across three benchmark datasets demonstrate that the proposed method outperforms recent state-of-the-art (SOTA) fusion techniques. Extended experimental results on downstream object detection tasks further validate the distinct advantages of the proposed architecture for fusion through precise frequency decomposition. The code is available at: https://github.com/lrh-1994/F2Fusion
期刊介绍:
Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.