基于全球背景网络的红外和可见光图像融合

IF 1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging Pub Date : 2024-09-01 DOI:10.1117/1.jei.33.5.053016

Yonghong Li, Yu Shi, Xingcheng Pu, Suqiang Zhang

{"title":"基于全球背景网络的红外和可见光图像融合","authors":"Yonghong Li, Yu Shi, Xingcheng Pu, Suqiang Zhang","doi":"10.1117/1.jei.33.5.053016","DOIUrl":null,"url":null,"abstract":"Thermal radiation and texture data from two different sensor types are usually combined in the fusion of infrared and visible images for generating a single image. In recent years, convolutional neural network (CNN) based on deep learning has become the mainstream technology for many infrared and visible image fusion methods, which often extracts shallow features and ignores the role of long-range dependencies in the fusion task. However, due to its local perception characteristics, CNN can only obtain global contextual information by continuously stacking convolutional layers, which leads to low network efficiency and difficulty in optimization. To address this issue, we proposed a global context fusion network (GCFN) to model context using a global attention pool, which adopts a two-stage strategy. First, a GCFN-based autoencoder network is trained for extracting multi-scale local and global contextual features. To effectively incorporate the complementary information of the input image, a dual branch fusion network combining CNN and transformer is designed in the second step. Experimental results on a publicly available dataset demonstrate that the proposed method outperforms nine advanced methods in fusion performance on both subjective and objective metrics.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"23 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Infrared and visible image fusion based on global context network\",\"authors\":\"Yonghong Li, Yu Shi, Xingcheng Pu, Suqiang Zhang\",\"doi\":\"10.1117/1.jei.33.5.053016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Thermal radiation and texture data from two different sensor types are usually combined in the fusion of infrared and visible images for generating a single image. In recent years, convolutional neural network (CNN) based on deep learning has become the mainstream technology for many infrared and visible image fusion methods, which often extracts shallow features and ignores the role of long-range dependencies in the fusion task. However, due to its local perception characteristics, CNN can only obtain global contextual information by continuously stacking convolutional layers, which leads to low network efficiency and difficulty in optimization. To address this issue, we proposed a global context fusion network (GCFN) to model context using a global attention pool, which adopts a two-stage strategy. First, a GCFN-based autoencoder network is trained for extracting multi-scale local and global contextual features. To effectively incorporate the complementary information of the input image, a dual branch fusion network combining CNN and transformer is designed in the second step. Experimental results on a publicly available dataset demonstrate that the proposed method outperforms nine advanced methods in fusion performance on both subjective and objective metrics.\",\"PeriodicalId\":54843,\"journal\":{\"name\":\"Journal of Electronic Imaging\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electronic Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1117/1.jei.33.5.053016\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.5.053016","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

在红外图像和可见光图像的融合过程中，通常会将来自两种不同传感器类型的热辐射和纹理数据结合起来，生成单一图像。近年来，基于深度学习的卷积神经网络（CNN）已成为许多红外图像与可见光图像融合方法的主流技术，它往往提取浅层特征，忽略了长距离依赖关系在融合任务中的作用。然而，由于其局部感知特性，CNN 只能通过不断堆叠卷积层来获取全局上下文信息，导致网络效率低、优化困难。针对这一问题，我们提出了全局上下文融合网络（GCFN），利用全局注意力池建立上下文模型，采用两阶段策略。首先，训练基于 GCFN 的自动编码器网络，以提取多尺度的局部和全局上下文特征。为了有效整合输入图像的互补信息，第二步设计了一个结合 CNN 和变换器的双分支融合网络。在公开数据集上的实验结果表明，所提出的方法在主观和客观指标上的融合性能均优于九种先进方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Infrared and visible image fusion based on global context network

Thermal radiation and texture data from two different sensor types are usually combined in the fusion of infrared and visible images for generating a single image. In recent years, convolutional neural network (CNN) based on deep learning has become the mainstream technology for many infrared and visible image fusion methods, which often extracts shallow features and ignores the role of long-range dependencies in the fusion task. However, due to its local perception characteristics, CNN can only obtain global contextual information by continuously stacking convolutional layers, which leads to low network efficiency and difficulty in optimization. To address this issue, we proposed a global context fusion network (GCFN) to model context using a global attention pool, which adopts a two-stage strategy. First, a GCFN-based autoencoder network is trained for extracting multi-scale local and global contextual features. To effectively incorporate the complementary information of the input image, a dual branch fusion network combining CNN and transformer is designed in the second step. Experimental results on a publicly available dataset demonstrate that the proposed method outperforms nine advanced methods in fusion performance on both subjective and objective metrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Electronic Imaging 工程技术-成像科学与照相技术

CiteScore

1.70

自引率

27.30%

发文量

341

审稿时长

4.0 months

期刊介绍： The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.