MATCNN: Infrared and Visible Image Fusion Method Based on Multiscale CNN With Attention Transformer

IF 5.9 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Instrumentation and Measurement Pub Date : 2025-02-21 DOI:10.1109/TIM.2025.3542877

Jingjing Liu;Li Zhang;Xiaoyang Zeng;Wanquan Liu;Jianhua Zhang

{"title":"MATCNN: Infrared and Visible Image Fusion Method Based on Multiscale CNN With Attention Transformer","authors":"Jingjing Liu;Li Zhang;Xiaoyang Zeng;Wanquan Liu;Jianhua Zhang","doi":"10.1109/TIM.2025.3542877","DOIUrl":null,"url":null,"abstract":"While attention-based approaches have shown considerable progress in enhancing image fusion and addressing the challenges posed by long-range feature dependencies, their efficacy in capturing local features is compromised by the lack of diverse receptive field extraction techniques. To overcome the shortcomings of existing fusion methods in extracting multiscale local features and preserving global features, this article proposes a novel cross-modal image fusion approach based on a multiscale convolutional neural network with an attention Transformer (MATCNN). MATCNN utilizes the multiscale fusion module (MSFM) to extract local features at different scales and employs the global feature extraction module (GFEM) to extract global features. Combining the two reduces the loss of detail features and improves the ability of global feature representation. Simultaneously, an information mask is used to label pertinent details within the images, aiming to enhance the proportion of preserving significant information in infrared images and background textures in visible images in fused images. Subsequently, a novel optimization algorithm is developed, leveraging the mask to guide feature extraction through the integration of content, structural similarity index (SSIM) measurement, and global feature loss. Quantitative and qualitative evaluations are conducted across various datasets, revealing that MATCNN effectively highlights infrared salient targets, preserves additional details in visible images, and achieves better fusion results for cross-modal images. The code of MATCNN will be available at <uri>https://github.com/zhang3849/MATCNN.git</uri>.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-14"},"PeriodicalIF":5.9000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10897317/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

While attention-based approaches have shown considerable progress in enhancing image fusion and addressing the challenges posed by long-range feature dependencies, their efficacy in capturing local features is compromised by the lack of diverse receptive field extraction techniques. To overcome the shortcomings of existing fusion methods in extracting multiscale local features and preserving global features, this article proposes a novel cross-modal image fusion approach based on a multiscale convolutional neural network with an attention Transformer (MATCNN). MATCNN utilizes the multiscale fusion module (MSFM) to extract local features at different scales and employs the global feature extraction module (GFEM) to extract global features. Combining the two reduces the loss of detail features and improves the ability of global feature representation. Simultaneously, an information mask is used to label pertinent details within the images, aiming to enhance the proportion of preserving significant information in infrared images and background textures in visible images in fused images. Subsequently, a novel optimization algorithm is developed, leveraging the mask to guide feature extraction through the integration of content, structural similarity index (SSIM) measurement, and global feature loss. Quantitative and qualitative evaluations are conducted across various datasets, revealing that MATCNN effectively highlights infrared salient targets, preserves additional details in visible images, and achieves better fusion results for cross-modal images. The code of MATCNN will be available at https://github.com/zhang3849/MATCNN.git.

查看原文本刊更多论文

MATCNN：基于多尺度CNN和注意力转换器的红外和可见光图像融合方法

虽然基于注意力的方法在增强图像融合和解决远程特征依赖带来的挑战方面取得了相当大的进展，但由于缺乏多样化的感受野提取技术，它们在捕获局部特征方面的效果受到影响。为了克服现有融合方法在提取多尺度局部特征和保留全局特征方面的不足，本文提出了一种基于多尺度卷积神经网络（MATCNN）的跨模态图像融合方法。MATCNN采用多尺度融合模块（MSFM）提取不同尺度的局部特征，采用全局特征提取模块（GFEM）提取全局特征。两者的结合减少了细节特征的丢失，提高了全局特征表示的能力。同时，利用信息掩模对图像中的相关细节进行标记，以提高融合图像中红外图像中重要信息的保留比例和可见光图像中背景纹理的保留比例。随后，开发了一种新的优化算法，利用掩码通过整合内容、结构相似指数（SSIM）测量和全局特征损失来指导特征提取。在不同的数据集上进行了定量和定性评估，表明MATCNN有效地突出了红外显著目标，保留了可见光图像中的附加细节，并且对跨模态图像取得了更好的融合结果。MATCNN的代码将在https://github.com/zhang3849/MATCNN.git上提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Instrumentation and Measurement 工程技术-工程：电子与电气

CiteScore

9.00

自引率

23.20%

发文量

1294

审稿时长

3.9 months

期刊介绍： Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.