{"title":"Unified Image Harmonization with Region Augmented Attention Normalization","authors":"Junjie Hou, Yuqi Zhang, Duo Su","doi":"10.1007/s40745-024-00531-6","DOIUrl":null,"url":null,"abstract":"<div><p>The image harmonization task endeavors to adjust foreground information within an image synthesis process to achieve visual consistency by leveraging background information. In academic research, this task conventionally involves the utilization of simple synthesized images and matching masks as inputs. However, obtaining precise masks for image harmonization in practical applications poses a significant challenge, thereby creating a notable disparity between research findings and real-world applicability. To mitigate this disparity, we propose a redefinition of the image harmonization task as “Unified Image Harmonization,” where the input comprises only a single image, thereby enhancing its applicability in real-world scenarios. To address this challenge, we have developed a novel framework. Within this framework, we initially employ inharmonious region localization to detect the mask, which is subsequently utilized for harmonization tasks. The pivotal aspect of the harmonization process lies in normalization, which is accountable for information transfer. Nonetheless, the current background-to-foreground information transfer and guidance mechanisms are limited by single-layer guidance, thereby constraining their effectiveness. To overcome this limitation, we introduce Region Augmented Attention Normalization (RA2N), which enhances the attention mechanism for foreground feature alignment, consequently leading to improved alignment and transfer capabilities. Through qualitative and quantitative comparisons on the iHarmony4 dataset, our model exhibits exceptional performance not only in unified image harmonization but also in conventional image harmonization tasks.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Data Science","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s40745-024-00531-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
The image harmonization task endeavors to adjust foreground information within an image synthesis process to achieve visual consistency by leveraging background information. In academic research, this task conventionally involves the utilization of simple synthesized images and matching masks as inputs. However, obtaining precise masks for image harmonization in practical applications poses a significant challenge, thereby creating a notable disparity between research findings and real-world applicability. To mitigate this disparity, we propose a redefinition of the image harmonization task as “Unified Image Harmonization,” where the input comprises only a single image, thereby enhancing its applicability in real-world scenarios. To address this challenge, we have developed a novel framework. Within this framework, we initially employ inharmonious region localization to detect the mask, which is subsequently utilized for harmonization tasks. The pivotal aspect of the harmonization process lies in normalization, which is accountable for information transfer. Nonetheless, the current background-to-foreground information transfer and guidance mechanisms are limited by single-layer guidance, thereby constraining their effectiveness. To overcome this limitation, we introduce Region Augmented Attention Normalization (RA2N), which enhances the attention mechanism for foreground feature alignment, consequently leading to improved alignment and transfer capabilities. Through qualitative and quantitative comparisons on the iHarmony4 dataset, our model exhibits exceptional performance not only in unified image harmonization but also in conventional image harmonization tasks.
期刊介绍:
Annals of Data Science (ADS) publishes cutting-edge research findings, experimental results and case studies of data science. Although Data Science is regarded as an interdisciplinary field of using mathematics, statistics, databases, data mining, high-performance computing, knowledge management and virtualization to discover knowledge from Big Data, it should have its own scientific contents, such as axioms, laws and rules, which are fundamentally important for experts in different fields to explore their own interests from Big Data. ADS encourages contributors to address such challenging problems at this exchange platform. At present, how to discover knowledge from heterogeneous data under Big Data environment needs to be addressed. ADS is a series of volumes edited by either the editorial office or guest editors. Guest editors will be responsible for call-for-papers and the review process for high-quality contributions in their volumes.