{"title":"Direct Distillation: A Novel Approach for Efficient Diffusion Model Inference.","authors":"Zilai Li, Rongkai Zhang","doi":"10.3390/jimaging11020066","DOIUrl":null,"url":null,"abstract":"<p><p>Diffusion models are among the most common techniques used for image generation, having achieved state-of-the-art performance by implementing auto-regressive algorithms. However, multi-step inference processes are typically slow and require extensive computational resources. To address this issue, we propose the use of an information bottleneck to reschedule inference using a new sampling strategy, which employs a lightweight distilled neural network to map intermediate stages to the final output. This approach reduces the number of iterations and FLOPS required for inference while ensuring the diversity of generated images. A series of validation experiments were conducted involving the COCO dataset as well as the LAION dataset and two proposed distillation models, requiring 57.5 million and 13.5 million parameters, respectively. Results showed that these models were able to bypass 40-50% of the inference steps originally required by a stable U-Net diffusion model, which included 859 million parameters. In the original sampling process, each inference step required 67,749 million multiply-accumulate operations (MACs), while our two distillate models only required 3954 million MACs and 3922 million MACs per inference step. In addition, our distillation algorithm produced a Fréchet inception distance (FID) of 16.75 in eight steps, which was remarkably lower than those of the progressive distillation, adversarial distillation, and DDIM solver algorithms, which produced FID values of 21.0, 30.0, 22.3, and 24.0, respectively. Notably, this process did not require parameters from the original diffusion model to establish a new distillation model prior to training. Information theory was used to further analyze primary bottlenecks in the FID results of existing distillation algorithms, demonstrating that both GANs and typical distillation failed to achieve generative diversity while implicitly studying incorrect posterior probability distributions. Meanwhile, we use information theory to analyze the latest distillation models including LCM-SDXL, SDXL-Turbo, SDXL-Lightning, DMD, and MSD, which reveals the basic reason for the diversity problem confronted by them, and compare those distillation models with our algorithm in the FID and CLIP Score.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 2","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11856141/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jimaging11020066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Diffusion models are among the most common techniques used for image generation, having achieved state-of-the-art performance by implementing auto-regressive algorithms. However, multi-step inference processes are typically slow and require extensive computational resources. To address this issue, we propose the use of an information bottleneck to reschedule inference using a new sampling strategy, which employs a lightweight distilled neural network to map intermediate stages to the final output. This approach reduces the number of iterations and FLOPS required for inference while ensuring the diversity of generated images. A series of validation experiments were conducted involving the COCO dataset as well as the LAION dataset and two proposed distillation models, requiring 57.5 million and 13.5 million parameters, respectively. Results showed that these models were able to bypass 40-50% of the inference steps originally required by a stable U-Net diffusion model, which included 859 million parameters. In the original sampling process, each inference step required 67,749 million multiply-accumulate operations (MACs), while our two distillate models only required 3954 million MACs and 3922 million MACs per inference step. In addition, our distillation algorithm produced a Fréchet inception distance (FID) of 16.75 in eight steps, which was remarkably lower than those of the progressive distillation, adversarial distillation, and DDIM solver algorithms, which produced FID values of 21.0, 30.0, 22.3, and 24.0, respectively. Notably, this process did not require parameters from the original diffusion model to establish a new distillation model prior to training. Information theory was used to further analyze primary bottlenecks in the FID results of existing distillation algorithms, demonstrating that both GANs and typical distillation failed to achieve generative diversity while implicitly studying incorrect posterior probability distributions. Meanwhile, we use information theory to analyze the latest distillation models including LCM-SDXL, SDXL-Turbo, SDXL-Lightning, DMD, and MSD, which reveals the basic reason for the diversity problem confronted by them, and compare those distillation models with our algorithm in the FID and CLIP Score.