基于模型潜在空间的扩散模型去噪

IF 1.8 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Algorithms Pub Date : 2023-10-28 DOI:10.3390/a16110501

Carmelo Scribano, Danilo Pezzi, Giorgia Franchini, Marco Prato

{"title":"基于模型潜在空间的扩散模型去噪","authors":"Carmelo Scribano, Danilo Pezzi, Giorgia Franchini, Marco Prato","doi":"10.3390/a16110501","DOIUrl":null,"url":null,"abstract":"With the recent advancements in the field of diffusion generative models, it has been shown that defining the generative process in the latent space of a powerful pretrained autoencoder can offer substantial advantages. This approach, by abstracting away imperceptible image details and introducing substantial spatial compression, renders the learning of the generative process more manageable while significantly reducing computational and memory demands. In this work, we propose to replace autoencoder coding with a model-based coding scheme based on traditional lossy image compression techniques; this choice not only further diminishes computational expenses but also allows us to probe the boundaries of latent-space image generation. Our objectives culminate in the proposal of a valuable approximation for training continuous diffusion models within a discrete space, accompanied by enhancements to the generative model for categorical values. Beyond the good results obtained for the problem at hand, we believe that the proposed work holds promise for enhancing the adaptability of generative diffusion models across diverse data types beyond the realm of imagery.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"90 1","pages":"0"},"PeriodicalIF":1.8000,"publicationDate":"2023-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Denoising Diffusion Models on Model-Based Latent Space\",\"authors\":\"Carmelo Scribano, Danilo Pezzi, Giorgia Franchini, Marco Prato\",\"doi\":\"10.3390/a16110501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the recent advancements in the field of diffusion generative models, it has been shown that defining the generative process in the latent space of a powerful pretrained autoencoder can offer substantial advantages. This approach, by abstracting away imperceptible image details and introducing substantial spatial compression, renders the learning of the generative process more manageable while significantly reducing computational and memory demands. In this work, we propose to replace autoencoder coding with a model-based coding scheme based on traditional lossy image compression techniques; this choice not only further diminishes computational expenses but also allows us to probe the boundaries of latent-space image generation. Our objectives culminate in the proposal of a valuable approximation for training continuous diffusion models within a discrete space, accompanied by enhancements to the generative model for categorical values. Beyond the good results obtained for the problem at hand, we believe that the proposed work holds promise for enhancing the adaptability of generative diffusion models across diverse data types beyond the realm of imagery.\",\"PeriodicalId\":7636,\"journal\":{\"name\":\"Algorithms\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Algorithms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/a16110501\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a16110501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

随着扩散生成模型领域的最新进展，已经证明在强大的预训练自编码器的潜在空间中定义生成过程可以提供实质性的优势。这种方法通过抽象掉难以察觉的图像细节并引入大量的空间压缩，使生成过程的学习更易于管理，同时显着减少了计算和内存需求。在这项工作中，我们提出用基于模型的编码方案取代自动编码器编码，该方案基于传统的有损图像压缩技术;这种选择不仅进一步减少了计算费用，而且还允许我们探索潜在空间图像生成的边界。我们的最终目标是提出一个有价值的近似，用于在离散空间内训练连续扩散模型，同时增强分类值的生成模型。除了为手头的问题获得的良好结果之外，我们相信所提出的工作有望增强生成扩散模型在图像领域之外的不同数据类型之间的适应性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Denoising Diffusion Models on Model-Based Latent Space

With the recent advancements in the field of diffusion generative models, it has been shown that defining the generative process in the latent space of a powerful pretrained autoencoder can offer substantial advantages. This approach, by abstracting away imperceptible image details and introducing substantial spatial compression, renders the learning of the generative process more manageable while significantly reducing computational and memory demands. In this work, we propose to replace autoencoder coding with a model-based coding scheme based on traditional lossy image compression techniques; this choice not only further diminishes computational expenses but also allows us to probe the boundaries of latent-space image generation. Our objectives culminate in the proposal of a valuable approximation for training continuous diffusion models within a discrete space, accompanied by enhancements to the generative model for categorical values. Beyond the good results obtained for the problem at hand, we believe that the proposed work holds promise for enhancing the adaptability of generative diffusion models across diverse data types beyond the realm of imagery.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊