{"title":"MagicStyle: Portrait Stylization Based on Reference Image","authors":"Zhaoli Deng, Kaibin Zhou, Fanyi Wang, Zhenpeng Mi","doi":"arxiv-2409.08156","DOIUrl":null,"url":null,"abstract":"The development of diffusion models has significantly advanced the research\non image stylization, particularly in the area of stylizing a content image\nbased on a given style image, which has attracted many scholars. The main\nchallenge in this reference image stylization task lies in how to maintain the\ndetails of the content image while incorporating the color and texture features\nof the style image. This challenge becomes even more pronounced when the\ncontent image is a portrait which has complex textural details. To address this\nchallenge, we propose a diffusion model-based reference image stylization\nmethod specifically for portraits, called MagicStyle. MagicStyle consists of\ntwo phases: Content and Style DDIM Inversion (CSDI) and Feature Fusion Forward\n(FFF). The CSDI phase involves a reverse denoising process, where DDIM\nInversion is performed separately on the content image and the style image,\nstoring the self-attention query, key and value features of both images during\nthe inversion process. The FFF phase executes forward denoising, harmoniously\nintegrating the texture and color information from the pre-stored feature\nqueries, keys and values into the diffusion generation process based on our\nWell-designed Feature Fusion Attention (FFA). We conducted comprehensive\ncomparative and ablation experiments to validate the effectiveness of our\nproposed MagicStyle and FFA.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The development of diffusion models has significantly advanced the research
on image stylization, particularly in the area of stylizing a content image
based on a given style image, which has attracted many scholars. The main
challenge in this reference image stylization task lies in how to maintain the
details of the content image while incorporating the color and texture features
of the style image. This challenge becomes even more pronounced when the
content image is a portrait which has complex textural details. To address this
challenge, we propose a diffusion model-based reference image stylization
method specifically for portraits, called MagicStyle. MagicStyle consists of
two phases: Content and Style DDIM Inversion (CSDI) and Feature Fusion Forward
(FFF). The CSDI phase involves a reverse denoising process, where DDIM
Inversion is performed separately on the content image and the style image,
storing the self-attention query, key and value features of both images during
the inversion process. The FFF phase executes forward denoising, harmoniously
integrating the texture and color information from the pre-stored feature
queries, keys and values into the diffusion generation process based on our
Well-designed Feature Fusion Attention (FFA). We conducted comprehensive
comparative and ablation experiments to validate the effectiveness of our
proposed MagicStyle and FFA.