{"title":"Lightweight Diffusion Models Based on Multi-Objective Evolutionary Neural Architecture Search.","authors":"Yu Xue, Chunxiao Jiao, Yong Zhang, Ali Wagdy Mohamed, Romany Fouad Mansour, Ferrante Neri","doi":"10.1142/S0129065725500595","DOIUrl":null,"url":null,"abstract":"<p><p>Diffusion models have achieved remarkable success in image generation, image super-resolution, and text-to-image synthesis. Despite their effectiveness, they face key challenges, notably long inference time and complex architectures that incur high computational costs. While various methods have been proposed to reduce inference steps and accelerate computation, the optimization of diffusion model architectures has received comparatively limited attention. To address this gap, we propose LDMOES (<b>L</b>ightweight <b>D</b>iffusion Models based on <b>M</b>ulti-<b>O</b>bjective <b>E</b>volutionary <b>S</b>earch), a framework that combines multi-objective evolutionary neural architecture search with knowledge distillation to design efficient UNet-based diffusion models. By adopting a modular search space, LDMOES effectively decouples architecture components for improved search efficiency. We validated our method on multiple datasets, including CIFAR-10, Tiny-ImageNet, CelebA-HQ [Formula: see text], and LSUN-church [Formula: see text]. Experiments show that LDMOES reduces multiply-accumulate operations (MACs) by approximately 40% in pixel space while outperforming the teacher model. When transferred to the larger-scale Tiny-ImageNet dataset, it still generates high-quality images with a competitive FID score of 4.16, demonstrating strong generalization ability. In latent space, MACs are reduced by about 50% with negligible performance loss. After transferring to the more complex LSUN-church dataset, the model surpasses baselines in generation quality while reducing computational cost by nearly 60%, validating the effectiveness and transferability of the multi-objective search strategy. Code and models will be available at https://github.com/GenerativeMind-arch/LDMOES.</p>","PeriodicalId":94052,"journal":{"name":"International journal of neural systems","volume":" ","pages":"2550059"},"PeriodicalIF":6.4000,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of neural systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129065725500595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Diffusion models have achieved remarkable success in image generation, image super-resolution, and text-to-image synthesis. Despite their effectiveness, they face key challenges, notably long inference time and complex architectures that incur high computational costs. While various methods have been proposed to reduce inference steps and accelerate computation, the optimization of diffusion model architectures has received comparatively limited attention. To address this gap, we propose LDMOES (Lightweight Diffusion Models based on Multi-Objective Evolutionary Search), a framework that combines multi-objective evolutionary neural architecture search with knowledge distillation to design efficient UNet-based diffusion models. By adopting a modular search space, LDMOES effectively decouples architecture components for improved search efficiency. We validated our method on multiple datasets, including CIFAR-10, Tiny-ImageNet, CelebA-HQ [Formula: see text], and LSUN-church [Formula: see text]. Experiments show that LDMOES reduces multiply-accumulate operations (MACs) by approximately 40% in pixel space while outperforming the teacher model. When transferred to the larger-scale Tiny-ImageNet dataset, it still generates high-quality images with a competitive FID score of 4.16, demonstrating strong generalization ability. In latent space, MACs are reduced by about 50% with negligible performance loss. After transferring to the more complex LSUN-church dataset, the model surpasses baselines in generation quality while reducing computational cost by nearly 60%, validating the effectiveness and transferability of the multi-objective search strategy. Code and models will be available at https://github.com/GenerativeMind-arch/LDMOES.