{"title":"InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models","authors":"Yan Zheng, Lemeng Wu","doi":"arxiv-2409.11734","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce Geometry-Inverse-Meet-Pixel-Insert, short for\nGEO, an exceptionally versatile image editing technique designed to cater to\ncustomized user requirements at both local and global scales. Our approach\nseamlessly integrates text prompts and image prompts to yield diverse and\nprecise editing outcomes. Notably, our method operates without the need for\ntraining and is driven by two key contributions: (i) a novel geometric\naccumulation loss that enhances DDIM inversion to faithfully preserve pixel\nspace geometry and layout, and (ii) an innovative boosted image prompt\ntechnique that combines pixel-level editing for text-only inversion with latent\nspace geometry guidance for standard classifier-free reversion. Leveraging the\npublicly available Stable Diffusion model, our approach undergoes extensive\nevaluation across various image types and challenging prompt editing scenarios,\nconsistently delivering high-fidelity editing results for real images.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we introduce Geometry-Inverse-Meet-Pixel-Insert, short for
GEO, an exceptionally versatile image editing technique designed to cater to
customized user requirements at both local and global scales. Our approach
seamlessly integrates text prompts and image prompts to yield diverse and
precise editing outcomes. Notably, our method operates without the need for
training and is driven by two key contributions: (i) a novel geometric
accumulation loss that enhances DDIM inversion to faithfully preserve pixel
space geometry and layout, and (ii) an innovative boosted image prompt
technique that combines pixel-level editing for text-only inversion with latent
space geometry guidance for standard classifier-free reversion. Leveraging the
publicly available Stable Diffusion model, our approach undergoes extensive
evaluation across various image types and challenging prompt editing scenarios,
consistently delivering high-fidelity editing results for real images.
在本文中,我们介绍了几何反转与像素插入(Geometry-Inverse-Meet-Pixel-Insert,简称GEO),这是一种非常灵活的图像编辑技术,旨在满足用户在局部和全局范围内的个性化需求。我们的方法将文本提示和图像提示完美地结合在一起,从而产生多样化的精确编辑结果。值得注意的是,我们的方法无需训练即可运行,并由两个关键贡献驱动:(i) 一种新颖的几何累积损失,可增强 DDIM 反演以忠实保留像素空间的几何和布局;(ii) 一种创新的增强图像提示技术,可将纯文本反演的像素级编辑与标准无分类器反演的潜空间几何引导相结合。利用公开的稳定扩散模型,我们的方法在各种图像类型和具有挑战性的提示编辑场景中进行了广泛的评估,始终如一地为真实图像提供高保真编辑结果。