图像到图像翻译中稳定扩散的语义差异分析与缓解

IF 19.2 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Ieee-Caa Journal of Automatica Sinica Pub Date : 2025-04-01 DOI:10.1109/JAS.2024.124800

Yifan Yuan;Guanqun Yang;James Z. Wang;Hui Zhang;Hongming Shan;Fei-Yue Wang;Junping Zhang

{"title":"图像到图像翻译中稳定扩散的语义差异分析与缓解","authors":"Yifan Yuan;Guanqun Yang;James Z. Wang;Hui Zhang;Hongming Shan;Fei-Yue Wang;Junping Zhang","doi":"10.1109/JAS.2024.124800","DOIUrl":null,"url":null,"abstract":"Finding suitable initial noise that retains the original image's information is crucial for image-to-image (I2I) translation using text-to-image (T2I) diffusion models. A common approach is to add random noise directly to the original image, as in SDEdit. However, we have observed that this can result in “semantic discrepancy” issues, wherein T2I diffusion models mis-interpret the semantic relationships and generate content not present in the original image. We identify that the noise introduced by SDEdit disrupts the semantic integrity of the image, leading to unintended associations between unrelated regions after U-Net upsampling. Building on the widely-used latent diffusion model, Stable Diffusion, we propose a training-free, plug-and-play method to alleviate semantic discrepancy and enhance the fidelity of the translated image. By leveraging the deterministic nature of denoising diffusion implicit models (DDIMs) inversion, we correct the erroneous features and correlations from the original generative process with accurate ones from DDIM inversion. This approach alleviates semantic discrepancy and surpasses recent DDIM-inversion-based methods such as PnP with fewer priors, achieving a speedup of 11.2 times in experiments conducted on COCO, ImageNet, and ImageNet-R datasets across multiple I2I translation tasks. The codes are available at https://github.com/Sherlockyyf/Semantic_Discrepancy.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 4","pages":"705-718"},"PeriodicalIF":19.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dissecting and Mitigating Semantic Discrepancy in Stable Diffusion for Image-to-Image Translation\",\"authors\":\"Yifan Yuan;Guanqun Yang;James Z. Wang;Hui Zhang;Hongming Shan;Fei-Yue Wang;Junping Zhang\",\"doi\":\"10.1109/JAS.2024.124800\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding suitable initial noise that retains the original image's information is crucial for image-to-image (I2I) translation using text-to-image (T2I) diffusion models. A common approach is to add random noise directly to the original image, as in SDEdit. However, we have observed that this can result in “semantic discrepancy” issues, wherein T2I diffusion models mis-interpret the semantic relationships and generate content not present in the original image. We identify that the noise introduced by SDEdit disrupts the semantic integrity of the image, leading to unintended associations between unrelated regions after U-Net upsampling. Building on the widely-used latent diffusion model, Stable Diffusion, we propose a training-free, plug-and-play method to alleviate semantic discrepancy and enhance the fidelity of the translated image. By leveraging the deterministic nature of denoising diffusion implicit models (DDIMs) inversion, we correct the erroneous features and correlations from the original generative process with accurate ones from DDIM inversion. This approach alleviates semantic discrepancy and surpasses recent DDIM-inversion-based methods such as PnP with fewer priors, achieving a speedup of 11.2 times in experiments conducted on COCO, ImageNet, and ImageNet-R datasets across multiple I2I translation tasks. The codes are available at https://github.com/Sherlockyyf/Semantic_Discrepancy.\",\"PeriodicalId\":54230,\"journal\":{\"name\":\"Ieee-Caa Journal of Automatica Sinica\",\"volume\":\"12 4\",\"pages\":\"705-718\"},\"PeriodicalIF\":19.2000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ieee-Caa Journal of Automatica Sinica\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10946088/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10946088/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

使用文本到图像（T2I）扩散模型进行图像到图像（I2I）转换时，寻找保留原始图像信息的合适初始噪声至关重要。一种常见的方法是直接在原始图像中添加随机噪声，就像在SDEdit中一样。然而，我们观察到这可能导致“语义差异”问题，其中T2I扩散模型错误地解释了语义关系并生成了原始图像中不存在的内容。我们发现，SDEdit引入的噪声破坏了图像的语义完整性，导致U-Net上采样后不相关区域之间的意外关联。在广泛使用的潜在扩散模型——稳定扩散模型的基础上，我们提出了一种无需训练、即插即用的方法来缓解语义差异，提高翻译图像的保真度。利用去噪扩散隐式模型（DDIM）反演的确定性，我们用DDIM反演的准确特征和相关性修正了原始生成过程中的错误特征和相关性。在COCO、ImageNet和ImageNet- r数据集上进行的跨多个I2I翻译任务的实验中，该方法缓解了语义差异，并超越了最近基于ddimm反转的方法（如PnP），具有更少的先验，加速了11.2倍。代码可在https://github.com/Sherlockyyf/Semantic_Discrepancy上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dissecting and Mitigating Semantic Discrepancy in Stable Diffusion for Image-to-Image Translation

Finding suitable initial noise that retains the original image's information is crucial for image-to-image (I2I) translation using text-to-image (T2I) diffusion models. A common approach is to add random noise directly to the original image, as in SDEdit. However, we have observed that this can result in “semantic discrepancy” issues, wherein T2I diffusion models mis-interpret the semantic relationships and generate content not present in the original image. We identify that the noise introduced by SDEdit disrupts the semantic integrity of the image, leading to unintended associations between unrelated regions after U-Net upsampling. Building on the widely-used latent diffusion model, Stable Diffusion, we propose a training-free, plug-and-play method to alleviate semantic discrepancy and enhance the fidelity of the translated image. By leveraging the deterministic nature of denoising diffusion implicit models (DDIMs) inversion, we correct the erroneous features and correlations from the original generative process with accurate ones from DDIM inversion. This approach alleviates semantic discrepancy and surpasses recent DDIM-inversion-based methods such as PnP with fewer priors, achieving a speedup of 11.2 times in experiments conducted on COCO, ImageNet, and ImageNet-R datasets across multiple I2I translation tasks. The codes are available at https://github.com/Sherlockyyf/Semantic_Discrepancy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ieee-Caa Journal of Automatica Sinica Engineering-Control and Systems Engineering

CiteScore

23.50

自引率

11.00%

发文量

880

期刊介绍： The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control. Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.