通过瓦瑟斯坦空间中的近似梯度下降实现基于流的生成模型的收敛

IF 2.2 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Information Theory Pub Date : 2024-07-03 DOI:10.1109/TIT.2024.3422412

Xiuyuan Cheng;Jianfeng Lu;Yixin Tan;Yao Xie

{"title":"通过瓦瑟斯坦空间中的近似梯度下降实现基于流的生成模型的收敛","authors":"Xiuyuan Cheng;Jianfeng Lu;Yixin Tan;Yao Xie","doi":"10.1109/TIT.2024.3422412","DOIUrl":null,"url":null,"abstract":"Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be \n<inline-formula> <tex-math>$O(\\varepsilon ^{2})$ </tex-math></inline-formula>\n when using \n<inline-formula> <tex-math>$N \\lesssim \\log (1/\\varepsilon)$ </tex-math></inline-formula>\n many JKO steps (N Residual Blocks in the flow) where \n<inline-formula> <tex-math>$\\varepsilon $ </tex-math></inline-formula>\n is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-\n<inline-formula> <tex-math>$\\mathcal {W}_{2}$ </tex-math></inline-formula>\n mixed error guarantees. The non-asymptotic convergence rate of the JKO-type \n<inline-formula> <tex-math>$\\mathcal {W}_{2}$ </tex-math></inline-formula>\n-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest. The analysis framework can extend to other first-order Wasserstein optimization schemes applied to flow-based generative models.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"70 11","pages":"8087-8106"},"PeriodicalIF":2.2000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Convergence of Flow-Based Generative Models via Proximal Gradient Descent in Wasserstein Space\",\"authors\":\"Xiuyuan Cheng;Jianfeng Lu;Yixin Tan;Yao Xie\",\"doi\":\"10.1109/TIT.2024.3422412\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be \\n<inline-formula> <tex-math>$O(\\\\varepsilon ^{2})$ </tex-math></inline-formula>\\n when using \\n<inline-formula> <tex-math>$N \\\\lesssim \\\\log (1/\\\\varepsilon)$ </tex-math></inline-formula>\\n many JKO steps (N Residual Blocks in the flow) where \\n<inline-formula> <tex-math>$\\\\varepsilon $ </tex-math></inline-formula>\\n is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-\\n<inline-formula> <tex-math>$\\\\mathcal {W}_{2}$ </tex-math></inline-formula>\\n mixed error guarantees. The non-asymptotic convergence rate of the JKO-type \\n<inline-formula> <tex-math>$\\\\mathcal {W}_{2}$ </tex-math></inline-formula>\\n-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest. The analysis framework can extend to other first-order Wasserstein optimization schemes applied to flow-based generative models.\",\"PeriodicalId\":13494,\"journal\":{\"name\":\"IEEE Transactions on Information Theory\",\"volume\":\"70 11\",\"pages\":\"8087-8106\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Theory\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10583905/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10583905/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

基于流量的生成模型在计算数据生成和似然方面具有一定优势，近来已显示出具有竞争力的实证性能。与基于分数的相关扩散模型的理论研究积累相比，基于流量的模型在正向（数据到噪声）和反向（噪声到数据）两个方向上都是确定性的，但对这种模型的分析仍然很少。本文提供了一种渐进式流量模型（即所谓的 JKO 流量模型）生成数据分布的理论保证，该模型在归一化流量网络中实现了乔丹-金德勒赫勒-奥托（JKO）方案。利用近似梯度下降（GD）在瓦瑟斯坦空间的指数收敛性，我们证明了当使用 $N \lesssim \log (1/\varepsilon)$ 许多 JKO 步骤（流中的 N 个残余块）时，JKO 流模型生成数据的库尔巴克-莱伯勒（KL）保证为 $O(\varepsilon ^{2})$，其中 $\varepsilon $ 是每步一阶条件的误差。对数据密度的假设仅仅是有限第二矩，该理论可扩展到无密度的数据分布，以及当反向过程中存在反演误差时，我们可以获得 KL- $mathcal {W}_{2}$ 混合误差保证。我们证明了 JKO 型 $\mathcal {W}_{2}$ -proximal GD 的非渐近收敛率，该收敛率适用于一般的凸目标函数，包括作为特例的 KL 发散，这可能是一个独立的兴趣点。分析框架可以扩展到其他应用于基于流的生成模型的一阶瓦瑟斯坦优化方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Convergence of Flow-Based Generative Models via Proximal Gradient Descent in Wasserstein Space

Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be

$O(\varepsilon ^{2})$

when using

$N \lesssim \log (1/\varepsilon)$

many JKO steps (N Residual Blocks in the flow) where

$\varepsilon $

is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-

$\mathcal {W}_{2}$

mixed error guarantees. The non-asymptotic convergence rate of the JKO-type

$\mathcal {W}_{2}$

-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest. The analysis framework can extend to other first-order Wasserstein optimization schemes applied to flow-based generative models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Information Theory 工程技术-工程：电子与电气

CiteScore

5.70

自引率

20.00%

发文量

514

审稿时长

12 months

期刊介绍： The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.