{"title":"Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation","authors":"Junsung Lee, Minsoo Kang, Bohyung Han","doi":"arxiv-2409.08077","DOIUrl":null,"url":null,"abstract":"We propose a simple but effective training-free approach tailored to\ndiffusion-based image-to-image translation. Our approach revises the original\nnoise prediction network of a pretrained diffusion model by introducing a noise\ncorrection term. We formulate the noise correction term as the difference\nbetween two noise predictions; one is computed from the denoising network with\na progressive interpolation of the source and target prompt embeddings, while\nthe other is the noise prediction with the source prompt embedding. The final\nnoise prediction network is given by a linear combination of the standard\ndenoising term and the noise correction term, where the former is designed to\nreconstruct must-be-preserved regions while the latter aims to effectively edit\nregions of interest relevant to the target prompt. Our approach can be easily\nincorporated into existing image-to-image translation methods based on\ndiffusion models. Extensive experiments verify that the proposed technique\nachieves outstanding performance with low latency and consistently improves\nexisting frameworks when combined with them.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a simple but effective training-free approach tailored to
diffusion-based image-to-image translation. Our approach revises the original
noise prediction network of a pretrained diffusion model by introducing a noise
correction term. We formulate the noise correction term as the difference
between two noise predictions; one is computed from the denoising network with
a progressive interpolation of the source and target prompt embeddings, while
the other is the noise prediction with the source prompt embedding. The final
noise prediction network is given by a linear combination of the standard
denoising term and the noise correction term, where the former is designed to
reconstruct must-be-preserved regions while the latter aims to effectively edit
regions of interest relevant to the target prompt. Our approach can be easily
incorporated into existing image-to-image translation methods based on
diffusion models. Extensive experiments verify that the proposed technique
achieves outstanding performance with low latency and consistently improves
existing frameworks when combined with them.