Direct Distillation: A Novel Approach for Efficient Diffusion Model Inference.

IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY
Zilai Li, Rongkai Zhang
{"title":"Direct Distillation: A Novel Approach for Efficient Diffusion Model Inference.","authors":"Zilai Li, Rongkai Zhang","doi":"10.3390/jimaging11020066","DOIUrl":null,"url":null,"abstract":"<p><p>Diffusion models are among the most common techniques used for image generation, having achieved state-of-the-art performance by implementing auto-regressive algorithms. However, multi-step inference processes are typically slow and require extensive computational resources. To address this issue, we propose the use of an information bottleneck to reschedule inference using a new sampling strategy, which employs a lightweight distilled neural network to map intermediate stages to the final output. This approach reduces the number of iterations and FLOPS required for inference while ensuring the diversity of generated images. A series of validation experiments were conducted involving the COCO dataset as well as the LAION dataset and two proposed distillation models, requiring 57.5 million and 13.5 million parameters, respectively. Results showed that these models were able to bypass 40-50% of the inference steps originally required by a stable U-Net diffusion model, which included 859 million parameters. In the original sampling process, each inference step required 67,749 million multiply-accumulate operations (MACs), while our two distillate models only required 3954 million MACs and 3922 million MACs per inference step. In addition, our distillation algorithm produced a Fréchet inception distance (FID) of 16.75 in eight steps, which was remarkably lower than those of the progressive distillation, adversarial distillation, and DDIM solver algorithms, which produced FID values of 21.0, 30.0, 22.3, and 24.0, respectively. Notably, this process did not require parameters from the original diffusion model to establish a new distillation model prior to training. Information theory was used to further analyze primary bottlenecks in the FID results of existing distillation algorithms, demonstrating that both GANs and typical distillation failed to achieve generative diversity while implicitly studying incorrect posterior probability distributions. Meanwhile, we use information theory to analyze the latest distillation models including LCM-SDXL, SDXL-Turbo, SDXL-Lightning, DMD, and MSD, which reveals the basic reason for the diversity problem confronted by them, and compare those distillation models with our algorithm in the FID and CLIP Score.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 2","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11856141/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jimaging11020066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Diffusion models are among the most common techniques used for image generation, having achieved state-of-the-art performance by implementing auto-regressive algorithms. However, multi-step inference processes are typically slow and require extensive computational resources. To address this issue, we propose the use of an information bottleneck to reschedule inference using a new sampling strategy, which employs a lightweight distilled neural network to map intermediate stages to the final output. This approach reduces the number of iterations and FLOPS required for inference while ensuring the diversity of generated images. A series of validation experiments were conducted involving the COCO dataset as well as the LAION dataset and two proposed distillation models, requiring 57.5 million and 13.5 million parameters, respectively. Results showed that these models were able to bypass 40-50% of the inference steps originally required by a stable U-Net diffusion model, which included 859 million parameters. In the original sampling process, each inference step required 67,749 million multiply-accumulate operations (MACs), while our two distillate models only required 3954 million MACs and 3922 million MACs per inference step. In addition, our distillation algorithm produced a Fréchet inception distance (FID) of 16.75 in eight steps, which was remarkably lower than those of the progressive distillation, adversarial distillation, and DDIM solver algorithms, which produced FID values of 21.0, 30.0, 22.3, and 24.0, respectively. Notably, this process did not require parameters from the original diffusion model to establish a new distillation model prior to training. Information theory was used to further analyze primary bottlenecks in the FID results of existing distillation algorithms, demonstrating that both GANs and typical distillation failed to achieve generative diversity while implicitly studying incorrect posterior probability distributions. Meanwhile, we use information theory to analyze the latest distillation models including LCM-SDXL, SDXL-Turbo, SDXL-Lightning, DMD, and MSD, which reveals the basic reason for the diversity problem confronted by them, and compare those distillation models with our algorithm in the FID and CLIP Score.

直接蒸馏:高效扩散模型推断的新方法
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Imaging
Journal of Imaging Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
5.90
自引率
6.20%
发文量
303
审稿时长
7 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信