利用简单的混合 CNN-Transformer 网络协调图像

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2024-08-30 DOI:10.1016/j.neunet.2024.106673

{"title":"利用简单的混合 CNN-Transformer 网络协调图像","authors":"","doi":"10.1016/j.neunet.2024.106673","DOIUrl":null,"url":null,"abstract":"<div><p>Image harmonization seeks to transfer the illumination distribution of the background to that of the foreground within a composite image. Existing methods lack the ability of establishing global–local pixel illumination dependencies between foreground and background of composite images, which is indispensable for sharp and color-consistent harmonized image generation. To overcome this challenge, we design a novel Simple Hybrid CNN-Transformer Network (SHT-Net), which is formulated into an efficient symmetrical hierarchical architecture. It is composed of two newly designed light-weight Transformer blocks. Firstly, the scale-aware gated block is designed to capture multi-scale features through different heads and expand the receptive fields, which facilitates to generate images with fine-grained details. Secondly, we introduce a simple parallel attention block, which integrates the window-based self-attention and gated channel attention in parallel, resulting in simultaneously global–local pixel illumination relationship modeling capability. Besides, we propose an efficient simple feed forward network to filter out less informative features and allow the features to contribute to generating photo-realistic harmonized results passing through. Extensive experiments on image harmonization benchmarks indicate that our method achieve promising quantitative and qualitative results. The code and pre-trained models are available at <span><span>https://github.com/guanguanboy/SHT-Net</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Image harmonization with Simple Hybrid CNN-Transformer Network\",\"authors\":\"\",\"doi\":\"10.1016/j.neunet.2024.106673\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Image harmonization seeks to transfer the illumination distribution of the background to that of the foreground within a composite image. Existing methods lack the ability of establishing global–local pixel illumination dependencies between foreground and background of composite images, which is indispensable for sharp and color-consistent harmonized image generation. To overcome this challenge, we design a novel Simple Hybrid CNN-Transformer Network (SHT-Net), which is formulated into an efficient symmetrical hierarchical architecture. It is composed of two newly designed light-weight Transformer blocks. Firstly, the scale-aware gated block is designed to capture multi-scale features through different heads and expand the receptive fields, which facilitates to generate images with fine-grained details. Secondly, we introduce a simple parallel attention block, which integrates the window-based self-attention and gated channel attention in parallel, resulting in simultaneously global–local pixel illumination relationship modeling capability. Besides, we propose an efficient simple feed forward network to filter out less informative features and allow the features to contribute to generating photo-realistic harmonized results passing through. Extensive experiments on image harmonization benchmarks indicate that our method achieve promising quantitative and qualitative results. The code and pre-trained models are available at <span><span>https://github.com/guanguanboy/SHT-Net</span><svg><path></path></svg></span>.</p></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608024005975\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024005975","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

图像协调的目的是在合成图像中将背景的光照分布转移到前景的光照分布上。现有的方法缺乏在合成图像的前景和背景之间建立全局-局部像素光照相关性的能力，而这对于生成清晰且色彩一致的协调图像是不可或缺的。为了克服这一难题，我们设计了一种新颖的简单混合 CNN 变换器网络（SHT-Net），它被设计成一种高效的对称分层架构。它由两个新设计的轻量级变换器模块组成。首先，规模感知门控块旨在通过不同的头捕捉多尺度特征并扩大感受野，从而有助于生成具有细粒度细节的图像。其次，我们引入了一个简单的并行注意模块，将基于窗口的自注意和门控通道注意并行整合，从而同时实现全局-局部像素光照关系建模能力。此外，我们还提出了一种高效的简单前馈网络，用于过滤信息量较少的特征，并让这些特征为生成照片般逼真的协调结果做出贡献。对图像协调基准的广泛实验表明，我们的方法在定量和定性方面都取得了可喜的成果。代码和预训练模型可在 https://github.com/guanguanboy/SHT-Net 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Image harmonization with Simple Hybrid CNN-Transformer Network

Image harmonization seeks to transfer the illumination distribution of the background to that of the foreground within a composite image. Existing methods lack the ability of establishing global–local pixel illumination dependencies between foreground and background of composite images, which is indispensable for sharp and color-consistent harmonized image generation. To overcome this challenge, we design a novel Simple Hybrid CNN-Transformer Network (SHT-Net), which is formulated into an efficient symmetrical hierarchical architecture. It is composed of two newly designed light-weight Transformer blocks. Firstly, the scale-aware gated block is designed to capture multi-scale features through different heads and expand the receptive fields, which facilitates to generate images with fine-grained details. Secondly, we introduce a simple parallel attention block, which integrates the window-based self-attention and gated channel attention in parallel, resulting in simultaneously global–local pixel illumination relationship modeling capability. Besides, we propose an efficient simple feed forward network to filter out less informative features and allow the features to contribute to generating photo-realistic harmonized results passing through. Extensive experiments on image harmonization benchmarks indicate that our method achieve promising quantitative and qualitative results. The code and pre-trained models are available at https://github.com/guanguanboy/SHT-Net.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.