微调图像条件扩散模型比想象中更容易

Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, Bastian Leibe
{"title":"微调图像条件扩散模型比想象中更容易","authors":"Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, Bastian Leibe","doi":"arxiv-2409.11355","DOIUrl":null,"url":null,"abstract":"Recent work showed that large diffusion models can be reused as highly\nprecise monocular depth estimators by casting depth estimation as an\nimage-conditional image generation task. While the proposed model achieved\nstate-of-the-art results, high computational demands due to multi-step\ninference limited its use in many scenarios. In this paper, we show that the\nperceived inefficiency was caused by a flaw in the inference pipeline that has\nso far gone unnoticed. The fixed model performs comparably to the best\npreviously reported configuration while being more than 200$\\times$ faster. To\noptimize for downstream task performance, we perform end-to-end fine-tuning on\ntop of the single-step model with task-specific losses and get a deterministic\nmodel that outperforms all other diffusion-based depth and normal estimation\nmodels on common zero-shot benchmarks. We surprisingly find that this\nfine-tuning protocol also works directly on Stable Diffusion and achieves\ncomparable performance to current state-of-the-art diffusion-based depth and\nnormal estimation models, calling into question some of the conclusions drawn\nfrom prior works.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think\",\"authors\":\"Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, Bastian Leibe\",\"doi\":\"arxiv-2409.11355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent work showed that large diffusion models can be reused as highly\\nprecise monocular depth estimators by casting depth estimation as an\\nimage-conditional image generation task. While the proposed model achieved\\nstate-of-the-art results, high computational demands due to multi-step\\ninference limited its use in many scenarios. In this paper, we show that the\\nperceived inefficiency was caused by a flaw in the inference pipeline that has\\nso far gone unnoticed. The fixed model performs comparably to the best\\npreviously reported configuration while being more than 200$\\\\times$ faster. To\\noptimize for downstream task performance, we perform end-to-end fine-tuning on\\ntop of the single-step model with task-specific losses and get a deterministic\\nmodel that outperforms all other diffusion-based depth and normal estimation\\nmodels on common zero-shot benchmarks. We surprisingly find that this\\nfine-tuning protocol also works directly on Stable Diffusion and achieves\\ncomparable performance to current state-of-the-art diffusion-based depth and\\nnormal estimation models, calling into question some of the conclusions drawn\\nfrom prior works.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11355\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

最近的研究表明,通过将深度估算作为动画条件图像生成任务,大型扩散模型可作为高精度单目深度估算器重复使用。虽然提出的模型取得了最先进的结果,但多步推理导致的高计算需求限制了它在许多场景中的应用。在本文中,我们证明了人们认为的低效率是由推理流水线中的缺陷造成的,而这一缺陷至今未被注意到。固定模型的性能与之前报道的最佳配置相当,而速度却快了 200 多倍。为了优化下游任务的性能,我们在具有特定任务损失的单步模型基础上进行了端到端的微调,得到了一个确定性模型,该模型在常见的零点基准上优于所有其他基于扩散的深度和法线估计模型。我们惊奇地发现,这种微调协议也能直接在稳定扩散模型上运行,并取得了与当前最先进的基于扩散的深度和法线估计模型相媲美的性能,这让我们对之前研究得出的一些结论产生了质疑。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Recent work showed that large diffusion models can be reused as highly precise monocular depth estimators by casting depth estimation as an image-conditional image generation task. While the proposed model achieved state-of-the-art results, high computational demands due to multi-step inference limited its use in many scenarios. In this paper, we show that the perceived inefficiency was caused by a flaw in the inference pipeline that has so far gone unnoticed. The fixed model performs comparably to the best previously reported configuration while being more than 200$\times$ faster. To optimize for downstream task performance, we perform end-to-end fine-tuning on top of the single-step model with task-specific losses and get a deterministic model that outperforms all other diffusion-based depth and normal estimation models on common zero-shot benchmarks. We surprisingly find that this fine-tuning protocol also works directly on Stable Diffusion and achieves comparable performance to current state-of-the-art diffusion-based depth and normal estimation models, calling into question some of the conclusions drawn from prior works.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信