Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control

arXiv - STAT - Machine Learning Pub Date : 2024-09-13 DOI:arxiv-2409.08861

Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, Ricky T. Q. Chen

引用次数: 0

Abstract

Dynamical generative models that produce samples through an iterative process, such as Flow Matching and denoising diffusion models, have seen widespread use, but there has not been many theoretically-sound methods for improving these models with reward fine-tuning. In this work, we cast reward fine-tuning as stochastic optimal control (SOC). Critically, we prove that a very specific memoryless noise schedule must be enforced during fine-tuning, in order to account for the dependency between the noise variable and the generated samples. We also propose a new algorithm named Adjoint Matching which outperforms existing SOC algorithms, by casting SOC problems as a regression problem. We find that our approach significantly improves over existing methods for reward fine-tuning, achieving better consistency, realism, and generalization to unseen human preference reward models, while retaining sample diversity.

查看原文本刊更多论文

邻接匹配：用无记忆随机优化控制微调流动和扩散生成模型

通过迭代过程产生样本的动态生成模型，如流匹配模型和去噪扩散模型，已经得到了广泛应用，但还没有很多理论上合理的方法来通过奖励微调改进这些模型。在这项工作中，我们将奖励微调视为随机最优控制（SOC）。重要的是，我们证明了在微调过程中必须执行非常具体的无记忆噪声计划，以考虑噪声变量与生成样本之间的依赖关系。我们还提出了一种名为 "交点匹配"（Adjithmoint Matching）的新算法，通过将 SOC 问题视为回归问题，该算法优于现有的 SOC 算法。我们发现，与现有的奖励微调方法相比，我们的方法有了明显改善，实现了更好的一致性、真实性和对未知人类偏好奖励模型的泛化，同时保留了采样多样性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - STAT - Machine Learning

自引率

0.00%

发文量