Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

ArXiv Pub Date : 2024-02-15 DOI:10.48550/arXiv.2402.10210

Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu

{"title":"Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation","authors":"Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu","doi":"10.48550/arXiv.2402.10210","DOIUrl":null,"url":null,"abstract":"Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images (\"winner\"and\"loser\"images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"1 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2402.10210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner"and"loser"images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.

查看原文本刊更多论文

文本到图像生成扩散模型的自播放微调

对扩散模型进行微调仍然是生成式人工智能（GenAI）领域一个尚未充分开发的前沿领域，尤其是与在微调大型语言模型（LLM）方面取得的显著进展相比。虽然稳定扩散（SD）和 SDXL 等尖端扩散模型依赖于有监督的微调，但它们的性能在看到一定量的数据后会不可避免地趋于平稳。最近，强化学习（RL）被用于利用人类偏好数据对扩散模型进行微调，但它要求每个文本提示至少有两个图像（"赢家 "和 "输家 "图像）。在本文中，我们引入了一种称为扩散模型自我游戏微调（SPIN-Diffusion）的创新技术，在这种技术中，扩散模型与其早期版本进行竞争，从而促进迭代式自我改进过程。我们的方法可替代传统的监督微调和 RL 策略，显著提高模型性能和一致性。我们在 Pick-a-Pic 数据集上的实验表明，SPIN-Diffusion 从第一次迭代开始就在人类偏好一致性和视觉吸引力方面优于现有的监督微调方法。到第二次迭代时，它在所有指标上的表现都超过了基于 RLHF 的方法，而且只用了较少的数据就取得了这些结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量