Lightweight Diffusion Models Based on Multi-Objective Evolutionary Neural Architecture Search.

IF 6.4

International journal of neural systems Pub Date : 2025-08-30 DOI:10.1142/S0129065725500595

Yu Xue, Chunxiao Jiao, Yong Zhang, Ali Wagdy Mohamed, Romany Fouad Mansour, Ferrante Neri

{"title":"Lightweight Diffusion Models Based on Multi-Objective Evolutionary Neural Architecture Search.","authors":"Yu Xue, Chunxiao Jiao, Yong Zhang, Ali Wagdy Mohamed, Romany Fouad Mansour, Ferrante Neri","doi":"10.1142/S0129065725500595","DOIUrl":null,"url":null,"abstract":"Diffusion models have achieved remarkable success in image generation, image super-resolution, and text-to-image synthesis. Despite their effectiveness, they face key challenges, notably long inference time and complex architectures that incur high computational costs. While various methods have been proposed to reduce inference steps and accelerate computation, the optimization of diffusion model architectures has received comparatively limited attention. To address this gap, we propose LDMOES (Lightweight Diffusion Models based on Multi-Objective Evolutionary Search), a framework that combines multi-objective evolutionary neural architecture search with knowledge distillation to design efficient UNet-based diffusion models. By adopting a modular search space, LDMOES effectively decouples architecture components for improved search efficiency. We validated our method on multiple datasets, including CIFAR-10, Tiny-ImageNet, CelebA-HQ [Formula: see text], and LSUN-church [Formula: see text]. Experiments show that LDMOES reduces multiply-accumulate operations (MACs) by approximately 40% in pixel space while outperforming the teacher model. When transferred to the larger-scale Tiny-ImageNet dataset, it still generates high-quality images with a competitive FID score of 4.16, demonstrating strong generalization ability. In latent space, MACs are reduced by about 50% with negligible performance loss. After transferring to the more complex LSUN-church dataset, the model surpasses baselines in generation quality while reducing computational cost by nearly 60%, validating the effectiveness and transferability of the multi-objective search strategy. Code and models will be available at https://github.com/GenerativeMind-arch/LDMOES.","PeriodicalId":94052,"journal":{"name":"International journal of neural systems","volume":" ","pages":"2550059"},"PeriodicalIF":6.4000,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of neural systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129065725500595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Diffusion models have achieved remarkable success in image generation, image super-resolution, and text-to-image synthesis. Despite their effectiveness, they face key challenges, notably long inference time and complex architectures that incur high computational costs. While various methods have been proposed to reduce inference steps and accelerate computation, the optimization of diffusion model architectures has received comparatively limited attention. To address this gap, we propose LDMOES (Lightweight Diffusion Models based on Multi-Objective Evolutionary Search), a framework that combines multi-objective evolutionary neural architecture search with knowledge distillation to design efficient UNet-based diffusion models. By adopting a modular search space, LDMOES effectively decouples architecture components for improved search efficiency. We validated our method on multiple datasets, including CIFAR-10, Tiny-ImageNet, CelebA-HQ [Formula: see text], and LSUN-church [Formula: see text]. Experiments show that LDMOES reduces multiply-accumulate operations (MACs) by approximately 40% in pixel space while outperforming the teacher model. When transferred to the larger-scale Tiny-ImageNet dataset, it still generates high-quality images with a competitive FID score of 4.16, demonstrating strong generalization ability. In latent space, MACs are reduced by about 50% with negligible performance loss. After transferring to the more complex LSUN-church dataset, the model surpasses baselines in generation quality while reducing computational cost by nearly 60%, validating the effectiveness and transferability of the multi-objective search strategy. Code and models will be available at https://github.com/GenerativeMind-arch/LDMOES.

查看原文本刊更多论文

基于多目标进化神经结构搜索的轻量级扩散模型。

扩散模型在图像生成、图像超分辨率和文本到图像合成方面取得了显著的成功。尽管它们很有效，但它们面临着关键的挑战，特别是长推理时间和复杂的体系结构，这些都导致了高计算成本。虽然已经提出了各种方法来减少推理步骤和加速计算，但扩散模型架构的优化受到的关注相对较少。为了解决这一问题，我们提出了LDMOES（基于多目标进化搜索的轻量级扩散模型）框架，该框架将多目标进化神经结构搜索与知识蒸馏相结合，以设计高效的基于unet的扩散模型。通过采用模块化搜索空间，LDMOES有效地解耦了体系结构组件，提高了搜索效率。我们在多个数据集上验证了我们的方法，包括CIFAR-10、Tiny-ImageNet、CelebA-HQ[公式：见文本]和LSUN-church[公式：见文本]。实验表明，LDMOES在像素空间中减少了大约40%的乘法累积操作（mac），同时优于教师模型。当转移到更大规模的Tiny-ImageNet数据集时，它仍然可以生成高质量的图像，并且具有竞争力的FID得分为4.16，显示出较强的泛化能力。在潜在空间中，mac减少了约50%，性能损失可以忽略不计。在转移到更复杂的LSUN-church数据集后，该模型在生成质量上超过了基线，同时减少了近60%的计算成本，验证了多目标搜索策略的有效性和可移植性。代码和模型可在https://github.com/GenerativeMind-arch/LDMOES上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International journal of neural systems

自引率

0.00%

发文量