Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, Ricky T. Q. Chen
{"title":"邻接匹配:用无记忆随机优化控制微调流动和扩散生成模型","authors":"Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, Ricky T. Q. Chen","doi":"arxiv-2409.08861","DOIUrl":null,"url":null,"abstract":"Dynamical generative models that produce samples through an iterative\nprocess, such as Flow Matching and denoising diffusion models, have seen\nwidespread use, but there has not been many theoretically-sound methods for\nimproving these models with reward fine-tuning. In this work, we cast reward\nfine-tuning as stochastic optimal control (SOC). Critically, we prove that a\nvery specific memoryless noise schedule must be enforced during fine-tuning, in\norder to account for the dependency between the noise variable and the\ngenerated samples. We also propose a new algorithm named Adjoint Matching which\noutperforms existing SOC algorithms, by casting SOC problems as a regression\nproblem. We find that our approach significantly improves over existing methods\nfor reward fine-tuning, achieving better consistency, realism, and\ngeneralization to unseen human preference reward models, while retaining sample\ndiversity.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"53 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control\",\"authors\":\"Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, Ricky T. Q. Chen\",\"doi\":\"arxiv-2409.08861\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamical generative models that produce samples through an iterative\\nprocess, such as Flow Matching and denoising diffusion models, have seen\\nwidespread use, but there has not been many theoretically-sound methods for\\nimproving these models with reward fine-tuning. In this work, we cast reward\\nfine-tuning as stochastic optimal control (SOC). Critically, we prove that a\\nvery specific memoryless noise schedule must be enforced during fine-tuning, in\\norder to account for the dependency between the noise variable and the\\ngenerated samples. We also propose a new algorithm named Adjoint Matching which\\noutperforms existing SOC algorithms, by casting SOC problems as a regression\\nproblem. We find that our approach significantly improves over existing methods\\nfor reward fine-tuning, achieving better consistency, realism, and\\ngeneralization to unseen human preference reward models, while retaining sample\\ndiversity.\",\"PeriodicalId\":501340,\"journal\":{\"name\":\"arXiv - STAT - Machine Learning\",\"volume\":\"53 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08861\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08861","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control
Dynamical generative models that produce samples through an iterative
process, such as Flow Matching and denoising diffusion models, have seen
widespread use, but there has not been many theoretically-sound methods for
improving these models with reward fine-tuning. In this work, we cast reward
fine-tuning as stochastic optimal control (SOC). Critically, we prove that a
very specific memoryless noise schedule must be enforced during fine-tuning, in
order to account for the dependency between the noise variable and the
generated samples. We also propose a new algorithm named Adjoint Matching which
outperforms existing SOC algorithms, by casting SOC problems as a regression
problem. We find that our approach significantly improves over existing methods
for reward fine-tuning, achieving better consistency, realism, and
generalization to unseen human preference reward models, while retaining sample
diversity.