Weilun Wang;Jianmin Bao;Wengang Zhou;Dongdong Chen;Dong Chen;Lu Yuan;Houqiang Li
{"title":"SinDiffusion:从单个自然图像中学习扩散模型","authors":"Weilun Wang;Jianmin Bao;Wengang Zhou;Dongdong Chen;Dong Chen;Lu Yuan;Houqiang Li","doi":"10.1109/TPAMI.2025.3532956","DOIUrl":null,"url":null,"abstract":"We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. The default approach of previous GAN-based methods on this problem is to train multiple models at progressive growing scales, which leads to the accumulation of errors and causes characteristic artifacts in generated results. In this paper, we uncover that multiple models at progressive growing scales are not essential for learning from a single image and propose SinDiffusion, a single diffusion-based model trained on a single scale, which is better-suited for this task. Furthermore, we identify that a patch-level receptive field is crucial and effective for diffusion models to capture the image’s patch statistics, therefore we redesign an patch-wise denoising network for SinDiffusion. Coupling these two designs enables SinDiffusion to generate more photorealistic and diverse images from a single image compared with GAN-based approaches. SinDiffusion can also be applied to various applications, i.e., text-guided image generation, and image outpainting beyond the capability of SinGAN. Extensive experiments on a wide range of images demonstrate the superiority of SinDiffusion for modeling the patch distribution.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3412-3423"},"PeriodicalIF":18.6000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SinDiffusion: Learning a Diffusion Model From a Single Natural Image\",\"authors\":\"Weilun Wang;Jianmin Bao;Wengang Zhou;Dongdong Chen;Dong Chen;Lu Yuan;Houqiang Li\",\"doi\":\"10.1109/TPAMI.2025.3532956\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. The default approach of previous GAN-based methods on this problem is to train multiple models at progressive growing scales, which leads to the accumulation of errors and causes characteristic artifacts in generated results. In this paper, we uncover that multiple models at progressive growing scales are not essential for learning from a single image and propose SinDiffusion, a single diffusion-based model trained on a single scale, which is better-suited for this task. Furthermore, we identify that a patch-level receptive field is crucial and effective for diffusion models to capture the image’s patch statistics, therefore we redesign an patch-wise denoising network for SinDiffusion. Coupling these two designs enables SinDiffusion to generate more photorealistic and diverse images from a single image compared with GAN-based approaches. SinDiffusion can also be applied to various applications, i.e., text-guided image generation, and image outpainting beyond the capability of SinGAN. Extensive experiments on a wide range of images demonstrate the superiority of SinDiffusion for modeling the patch distribution.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 5\",\"pages\":\"3412-3423\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10855351/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10855351/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SinDiffusion: Learning a Diffusion Model From a Single Natural Image
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. The default approach of previous GAN-based methods on this problem is to train multiple models at progressive growing scales, which leads to the accumulation of errors and causes characteristic artifacts in generated results. In this paper, we uncover that multiple models at progressive growing scales are not essential for learning from a single image and propose SinDiffusion, a single diffusion-based model trained on a single scale, which is better-suited for this task. Furthermore, we identify that a patch-level receptive field is crucial and effective for diffusion models to capture the image’s patch statistics, therefore we redesign an patch-wise denoising network for SinDiffusion. Coupling these two designs enables SinDiffusion to generate more photorealistic and diverse images from a single image compared with GAN-based approaches. SinDiffusion can also be applied to various applications, i.e., text-guided image generation, and image outpainting beyond the capability of SinGAN. Extensive experiments on a wide range of images demonstrate the superiority of SinDiffusion for modeling the patch distribution.