Yunji Jung , Seokju Lee , Tair Djanibekov , Jong Chul Ye , Hyunjung Shim
{"title":"Text optimization with latent inversion for non-rigid image editing","authors":"Yunji Jung , Seokju Lee , Tair Djanibekov , Jong Chul Ye , Hyunjung Shim","doi":"10.1016/j.patrec.2025.06.011","DOIUrl":null,"url":null,"abstract":"<div><div>Text-guided non-rigid image editing involves complex edits for input images, such as changing motion or compositions of the object (e.g., making a horse jump or adding candles on a cake). Since it requires manipulating the structure of the object, existing methods often compromise “image identity”– defined as the overall object appearance and background details – particularly when combined with Stable Diffusion. In this work, we propose a new approach for non-rigid image editing with Stable Diffusion, aimed at improving the image identity preservation quality without compromising editability. Our approach comprises three stages: text optimization, latent inversion, and timestep-aware text injection sampling. Inspired by the success of Imagic, we employ their text optimization for smooth editing. Then, we introduce latent inversion to preserve the input image’s identity without additional model fine-tuning. To fully utilize the input reconstruction ability of latent inversion, we employ timestep-aware text injection sampling, strategically injecting the source text prompt in early sampling steps and then transitioning to the target prompt in subsequent sampling steps. This strategic approach seamlessly harmonizes with text optimization, facilitating complex non-rigid edits to the input without losing the original identity. We demonstrate the effectiveness of our method in terms of identity preservation, editability, and aesthetic quality through extensive experiments. Our code is available at <span><span>https://github.com/YunjiJung0105/TOLI-non-rigid-editing</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 281-288"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002399","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Text-guided non-rigid image editing involves complex edits for input images, such as changing motion or compositions of the object (e.g., making a horse jump or adding candles on a cake). Since it requires manipulating the structure of the object, existing methods often compromise “image identity”– defined as the overall object appearance and background details – particularly when combined with Stable Diffusion. In this work, we propose a new approach for non-rigid image editing with Stable Diffusion, aimed at improving the image identity preservation quality without compromising editability. Our approach comprises three stages: text optimization, latent inversion, and timestep-aware text injection sampling. Inspired by the success of Imagic, we employ their text optimization for smooth editing. Then, we introduce latent inversion to preserve the input image’s identity without additional model fine-tuning. To fully utilize the input reconstruction ability of latent inversion, we employ timestep-aware text injection sampling, strategically injecting the source text prompt in early sampling steps and then transitioning to the target prompt in subsequent sampling steps. This strategic approach seamlessly harmonizes with text optimization, facilitating complex non-rigid edits to the input without losing the original identity. We demonstrate the effectiveness of our method in terms of identity preservation, editability, and aesthetic quality through extensive experiments. Our code is available at https://github.com/YunjiJung0105/TOLI-non-rigid-editing.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.