{"title":"基于全球背景网络的红外和可见光图像融合","authors":"Yonghong Li, Yu Shi, Xingcheng Pu, Suqiang Zhang","doi":"10.1117/1.jei.33.5.053016","DOIUrl":null,"url":null,"abstract":"Thermal radiation and texture data from two different sensor types are usually combined in the fusion of infrared and visible images for generating a single image. In recent years, convolutional neural network (CNN) based on deep learning has become the mainstream technology for many infrared and visible image fusion methods, which often extracts shallow features and ignores the role of long-range dependencies in the fusion task. However, due to its local perception characteristics, CNN can only obtain global contextual information by continuously stacking convolutional layers, which leads to low network efficiency and difficulty in optimization. To address this issue, we proposed a global context fusion network (GCFN) to model context using a global attention pool, which adopts a two-stage strategy. First, a GCFN-based autoencoder network is trained for extracting multi-scale local and global contextual features. To effectively incorporate the complementary information of the input image, a dual branch fusion network combining CNN and transformer is designed in the second step. Experimental results on a publicly available dataset demonstrate that the proposed method outperforms nine advanced methods in fusion performance on both subjective and objective metrics.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"23 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Infrared and visible image fusion based on global context network\",\"authors\":\"Yonghong Li, Yu Shi, Xingcheng Pu, Suqiang Zhang\",\"doi\":\"10.1117/1.jei.33.5.053016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Thermal radiation and texture data from two different sensor types are usually combined in the fusion of infrared and visible images for generating a single image. In recent years, convolutional neural network (CNN) based on deep learning has become the mainstream technology for many infrared and visible image fusion methods, which often extracts shallow features and ignores the role of long-range dependencies in the fusion task. However, due to its local perception characteristics, CNN can only obtain global contextual information by continuously stacking convolutional layers, which leads to low network efficiency and difficulty in optimization. To address this issue, we proposed a global context fusion network (GCFN) to model context using a global attention pool, which adopts a two-stage strategy. First, a GCFN-based autoencoder network is trained for extracting multi-scale local and global contextual features. To effectively incorporate the complementary information of the input image, a dual branch fusion network combining CNN and transformer is designed in the second step. Experimental results on a publicly available dataset demonstrate that the proposed method outperforms nine advanced methods in fusion performance on both subjective and objective metrics.\",\"PeriodicalId\":54843,\"journal\":{\"name\":\"Journal of Electronic Imaging\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electronic Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1117/1.jei.33.5.053016\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.5.053016","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Infrared and visible image fusion based on global context network
Thermal radiation and texture data from two different sensor types are usually combined in the fusion of infrared and visible images for generating a single image. In recent years, convolutional neural network (CNN) based on deep learning has become the mainstream technology for many infrared and visible image fusion methods, which often extracts shallow features and ignores the role of long-range dependencies in the fusion task. However, due to its local perception characteristics, CNN can only obtain global contextual information by continuously stacking convolutional layers, which leads to low network efficiency and difficulty in optimization. To address this issue, we proposed a global context fusion network (GCFN) to model context using a global attention pool, which adopts a two-stage strategy. First, a GCFN-based autoencoder network is trained for extracting multi-scale local and global contextual features. To effectively incorporate the complementary information of the input image, a dual branch fusion network combining CNN and transformer is designed in the second step. Experimental results on a publicly available dataset demonstrate that the proposed method outperforms nine advanced methods in fusion performance on both subjective and objective metrics.
期刊介绍:
The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.