Fine-grained hierarchical dynamics for image harmonization

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-05-27 DOI:10.1016/j.neunet.2025.107618

Peng He , Jun Yu , Liuxue Ju , Fang Gao

{"title":"Fine-grained hierarchical dynamics for image harmonization","authors":"Peng He , Jun Yu , Liuxue Ju , Fang Gao","doi":"10.1016/j.neunet.2025.107618","DOIUrl":null,"url":null,"abstract":"<div><div>Image harmonization aims to generate visually consistent composite images by ensuring compatibility between the foreground and background. Existing image harmonization strategies based on the global transformation emphasize using background information for foreground normalization, potentially overlooking significant variations in appearance among regions within various scenes. Simultaneously, the coherence of local information plays a critical role in generating visually consistent images as well. To address these issues, we propose the Hierarchical Dynamics Appearance Translation (HDAT) framework, enabling a seamless transition of features and parameters from local to global views in the network and adaptive adjustments of foreground appearance based on corresponding background information. Specifically, we introduce the dynamic region-aware convolution and fine-grained mixed attention mechanism to promote the harmonious coordination of global and local details. Among them, the dynamic region-aware convolution guided by foreground masks is utilized to learn adaptive representations and correlations of foreground and background elements based on global dynamics. Meanwhile, the fine-grained mixed attention dynamically adjusts features at different channels and positions to achieve local adaptations. Furthermore, we integrate a novel multi-scale feature calibration strategy to ensure information consistency across varying scales. Extensive experiments demonstrate that our HDAT framework significantly reduces the number of network parameters while outperforming existing methods both qualitatively and quantitatively.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"190 ","pages":"Article 107618"},"PeriodicalIF":6.0000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025004988","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Image harmonization aims to generate visually consistent composite images by ensuring compatibility between the foreground and background. Existing image harmonization strategies based on the global transformation emphasize using background information for foreground normalization, potentially overlooking significant variations in appearance among regions within various scenes. Simultaneously, the coherence of local information plays a critical role in generating visually consistent images as well. To address these issues, we propose the Hierarchical Dynamics Appearance Translation (HDAT) framework, enabling a seamless transition of features and parameters from local to global views in the network and adaptive adjustments of foreground appearance based on corresponding background information. Specifically, we introduce the dynamic region-aware convolution and fine-grained mixed attention mechanism to promote the harmonious coordination of global and local details. Among them, the dynamic region-aware convolution guided by foreground masks is utilized to learn adaptive representations and correlations of foreground and background elements based on global dynamics. Meanwhile, the fine-grained mixed attention dynamically adjusts features at different channels and positions to achieve local adaptations. Furthermore, we integrate a novel multi-scale feature calibration strategy to ensure information consistency across varying scales. Extensive experiments demonstrate that our HDAT framework significantly reduces the number of network parameters while outperforming existing methods both qualitatively and quantitatively.

查看原文本刊更多论文

用于图像协调的细粒度分层动态

图像协调旨在通过确保前景和背景之间的兼容性来生成视觉上一致的合成图像。现有的基于全局变换的图像协调策略强调使用背景信息进行前景归一化，可能忽略了不同场景中区域之间外观的显著变化。同时，局部信息的一致性对于生成视觉一致性的图像也起着至关重要的作用。为了解决这些问题，我们提出了层次动态外观转换（HDAT）框架，使网络中的特征和参数从局部视图无缝过渡到全局视图，并根据相应的背景信息自适应调整前景外观。具体来说，我们引入了动态区域感知卷积和细粒度混合注意机制，以促进全局和局部细节的和谐协调。其中，利用前景蒙版引导下的动态区域感知卷积，学习基于全局动态的前景和背景元素的自适应表示和相关性。同时，细粒度混合注意在不同的通道和位置动态调整特征，实现局部自适应。此外，我们还集成了一种新的多尺度特征校准策略，以确保信息在不同尺度上的一致性。大量的实验表明，我们的HDAT框架显著减少了网络参数的数量，同时在定性和定量上都优于现有的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.