EDM：用于复杂场景图像复原的增强扩散模型

The Visual Computer Pub Date : 2024-07-24 DOI:10.1007/s00371-024-03549-2

JiaYan Wen, YuanSheng Zhuang, JunYi Deng

{"title":"EDM：用于复杂场景图像复原的增强扩散模型","authors":"JiaYan Wen, YuanSheng Zhuang, JunYi Deng","doi":"10.1007/s00371-024-03549-2","DOIUrl":null,"url":null,"abstract":"At presently, diffusion model has achieved state-of-the-art performance by modeling the image synthesis process through a series of denoising network applications. Image restoration (IR) is to improve the subjective image quality corrupted by various kinds of degradation unlike image synthesis. However, IR for complex scenes such as worksite images is greatly challenging in the low-level vision field due to complicated environmental factors. To solve this problem, we propose a enhanced diffusion models for image restoration in complex scenes (EDM). It improves the authenticity and representation ability for the generation process, while effectively handles complex backgrounds and diverse object types. EDM has three main contributions: (1) Its framework adopts a Mish-based residual module, which enhances the ability to learn complex patterns of images, and allows for the presence of negative gradients to reduce overfitting risks during model training. (2) It employs a mixed-head self-attention mechanism, which augments the correlation among input elements at each time step, and maintains a better balance between capturing the global structural information and local detailed textures of the image. (3) This study evaluates EDM on a self-built dataset specifically tailored for worksite image restoration, named “Workplace,” and was compared with results from another two public datasets named Places2 and Rain100H. Furthermore, the achievement of experiments on these datasets not only demonstrates EDM’s application value in a specific domain, but also its potential and versatility in broader image restoration tasks. Code, dataset and models are available at: https://github.com/Zhuangvictor0/EDM-A-Enhanced-Diffusion-Models-for-Image-Restoration-in-Complex-Scenes","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"67 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EDM: a enhanced diffusion models for image restoration in complex scenes\",\"authors\":\"JiaYan Wen, YuanSheng Zhuang, JunYi Deng\",\"doi\":\"10.1007/s00371-024-03549-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At presently, diffusion model has achieved state-of-the-art performance by modeling the image synthesis process through a series of denoising network applications. Image restoration (IR) is to improve the subjective image quality corrupted by various kinds of degradation unlike image synthesis. However, IR for complex scenes such as worksite images is greatly challenging in the low-level vision field due to complicated environmental factors. To solve this problem, we propose a enhanced diffusion models for image restoration in complex scenes (EDM). It improves the authenticity and representation ability for the generation process, while effectively handles complex backgrounds and diverse object types. EDM has three main contributions: (1) Its framework adopts a Mish-based residual module, which enhances the ability to learn complex patterns of images, and allows for the presence of negative gradients to reduce overfitting risks during model training. (2) It employs a mixed-head self-attention mechanism, which augments the correlation among input elements at each time step, and maintains a better balance between capturing the global structural information and local detailed textures of the image. (3) This study evaluates EDM on a self-built dataset specifically tailored for worksite image restoration, named “Workplace,” and was compared with results from another two public datasets named Places2 and Rain100H. Furthermore, the achievement of experiments on these datasets not only demonstrates EDM’s application value in a specific domain, but also its potential and versatility in broader image restoration tasks. Code, dataset and models are available at: https://github.com/Zhuangvictor0/EDM-A-Enhanced-Diffusion-Models-for-Image-Restoration-in-Complex-Scenes\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":\"67 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03549-2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03549-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目前，扩散模型通过一系列去噪网络应用对图像合成过程进行建模，取得了最先进的性能。与图像合成不同，图像复原（IR）是为了改善被各种退化破坏的主观图像质量。然而，由于复杂的环境因素，在低层次视觉领域，复杂场景（如工地图像）的 IR 具有极大的挑战性。为了解决这个问题，我们提出了一种用于复杂场景图像复原的增强扩散模型（EDM）。它提高了生成过程的真实性和表示能力，同时还能有效处理复杂背景和不同物体类型。EDM 有三大贡献：（1）其框架采用了基于 Mish 的残差模块，增强了学习复杂图像模式的能力，并允许负梯度的存在，以降低模型训练过程中的过拟合风险。(2）它采用了混合头自关注机制，在每个时间步增强了输入元素之间的相关性，在捕捉图像的整体结构信息和局部细节纹理之间保持了较好的平衡。(3) 本研究在自建的专门用于工地图像修复的数据集 "Workplace "上对 EDM 进行了评估，并将其与另外两个公共数据集 "Places2 "和 "Rain100H "的结果进行了比较。此外，在这些数据集上的实验结果不仅证明了 EDM 在特定领域的应用价值，还证明了它在更广泛的图像修复任务中的潜力和多功能性。代码、数据集和模型可在以下网址获取： https://github.com/Zhuangvictor0/EDM-A-Enhanced-Diffusion-Models-for-Image-Restoration-in-Complex-Scenes

本文章由计算机程序翻译，如有差异，请以英文原文为准。

EDM: a enhanced diffusion models for image restoration in complex scenes

查看原文本刊更多论文

EDM: a enhanced diffusion models for image restoration in complex scenes

At presently, diffusion model has achieved state-of-the-art performance by modeling the image synthesis process through a series of denoising network applications. Image restoration (IR) is to improve the subjective image quality corrupted by various kinds of degradation unlike image synthesis. However, IR for complex scenes such as worksite images is greatly challenging in the low-level vision field due to complicated environmental factors. To solve this problem, we propose a enhanced diffusion models for image restoration in complex scenes (EDM). It improves the authenticity and representation ability for the generation process, while effectively handles complex backgrounds and diverse object types. EDM has three main contributions: (1) Its framework adopts a Mish-based residual module, which enhances the ability to learn complex patterns of images, and allows for the presence of negative gradients to reduce overfitting risks during model training. (2) It employs a mixed-head self-attention mechanism, which augments the correlation among input elements at each time step, and maintains a better balance between capturing the global structural information and local detailed textures of the image. (3) This study evaluates EDM on a self-built dataset specifically tailored for worksite image restoration, named “Workplace,” and was compared with results from another two public datasets named Places2 and Rain100H. Furthermore, the achievement of experiments on these datasets not only demonstrates EDM’s application value in a specific domain, but also its potential and versatility in broader image restoration tasks. Code, dataset and models are available at: https://github.com/Zhuangvictor0/EDM-A-Enhanced-Diffusion-Models-for-Image-Restoration-in-Complex-Scenes

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量