MuseumMaker: Continual Style Customization Without Catastrophic Forgetting

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-16 DOI:10.1109/TIP.2025.3553024

Chenxi Liu;Gan Sun;Wenqi Liang;Jiahua Dong;Can Qin;Yang Cong

{"title":"MuseumMaker: Continual Style Customization Without Catastrophic Forgetting","authors":"Chenxi Liu;Gan Sun;Wenqi Liang;Jiahua Dong;Can Qin;Yang Cong","doi":"10.1109/TIP.2025.3553024","DOIUrl":null,"url":null,"abstract":"Pre-trainedlarge text-to-image (T2I) models with an appropriate text prompt has attracted growing interests in customized image generation fields. However, catastrophic forgetting issue makes it hard to continually synthesize new user-provided styles while retaining the satisfying results amongst learned styles. In this paper, we propose MuseumMaker, a method that enables the synthesis of images by following a set of customized styles in a never-end manner, and gradually accumulates these creative artistic works as a Museum. When facing with a new customization style, we develop a style distillation loss module to extract and learn the styles of the training data for new image generation task. It can minimize the learning biases caused by content of new training images, and address the catastrophic overfitting issue induced by few-shot images. To deal with catastrophic forgetting issue amongst past learned styles, we devise a dual regularization for shared-LoRA module to optimize the direction of model update, which could regularize the diffusion model from both weight and feature aspects, respectively. Meanwhile, to further preserve historical knowledge from past styles and address the limited representability of LoRA, we design a task-wise token learning module where a unique token embedding is learned to denote a new style. As any new user-provided style come, our MuseumMaker can capture the nuances of the new styles while maintaining the details of learned styles. Experimental results on diverse style datasets validate the effectiveness of our proposed MuseumMaker method, showcasing its robustness and versatility across various scenarios.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2499-2512"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10965859/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Pre-trainedlarge text-to-image (T2I) models with an appropriate text prompt has attracted growing interests in customized image generation fields. However, catastrophic forgetting issue makes it hard to continually synthesize new user-provided styles while retaining the satisfying results amongst learned styles. In this paper, we propose MuseumMaker, a method that enables the synthesis of images by following a set of customized styles in a never-end manner, and gradually accumulates these creative artistic works as a Museum. When facing with a new customization style, we develop a style distillation loss module to extract and learn the styles of the training data for new image generation task. It can minimize the learning biases caused by content of new training images, and address the catastrophic overfitting issue induced by few-shot images. To deal with catastrophic forgetting issue amongst past learned styles, we devise a dual regularization for shared-LoRA module to optimize the direction of model update, which could regularize the diffusion model from both weight and feature aspects, respectively. Meanwhile, to further preserve historical knowledge from past styles and address the limited representability of LoRA, we design a task-wise token learning module where a unique token embedding is learned to denote a new style. As any new user-provided style come, our MuseumMaker can capture the nuances of the new styles while maintaining the details of learned styles. Experimental results on diverse style datasets validate the effectiveness of our proposed MuseumMaker method, showcasing its robustness and versatility across various scenarios.

查看原文本刊更多论文

MuseumMaker：持续的风格定制，没有灾难性的忘记补充材料

具有适当文本提示的预训练大型文本到图像（T2I）模型在自定义图像生成领域吸引了越来越多的兴趣。然而，灾难性遗忘问题使得我们很难持续合成新的用户提供的风格，同时在所学的风格中保留令人满意的结果。在本文中，我们提出MuseumMaker，一种按照一套定制的风格，永无止境地进行图像合成，并将这些创造性的艺术作品作为博物馆逐渐积累起来的方法。当面对新的自定义样式时，我们开发了样式蒸馏损失模块来提取和学习训练数据的样式，用于新的图像生成任务。它可以最大限度地减少新训练图像内容引起的学习偏差，并解决由少量图像引起的灾难性过拟合问题。为了解决过去学习风格中的灾难性遗忘问题，我们设计了一种双正则化共享- lora模块来优化模型更新的方向，可以分别从权值和特征两个方面对扩散模型进行正则化。同时，为了进一步保存过去风格的历史知识并解决LoRA的有限可表征性，我们设计了一个任务型token学习模块，其中学习了唯一的token嵌入来表示新风格。作为任何新的用户提供的风格来，我们的MuseumMaker可以捕捉新风格的细微差别，同时保持学习风格的细节。在不同风格数据集上的实验结果验证了我们提出的MuseumMaker方法的有效性，展示了它在不同场景下的鲁棒性和多功能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量