{"title":"Convergence of Flow-Based Generative Models via Proximal Gradient Descent in Wasserstein Space","authors":"Xiuyuan Cheng;Jianfeng Lu;Yixin Tan;Yao Xie","doi":"10.1109/TIT.2024.3422412","DOIUrl":null,"url":null,"abstract":"Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be \n<inline-formula> <tex-math>$O(\\varepsilon ^{2})$ </tex-math></inline-formula>\n when using \n<inline-formula> <tex-math>$N \\lesssim \\log (1/\\varepsilon)$ </tex-math></inline-formula>\n many JKO steps (N Residual Blocks in the flow) where \n<inline-formula> <tex-math>$\\varepsilon $ </tex-math></inline-formula>\n is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-\n<inline-formula> <tex-math>$\\mathcal {W}_{2}$ </tex-math></inline-formula>\n mixed error guarantees. The non-asymptotic convergence rate of the JKO-type \n<inline-formula> <tex-math>$\\mathcal {W}_{2}$ </tex-math></inline-formula>\n-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest. The analysis framework can extend to other first-order Wasserstein optimization schemes applied to flow-based generative models.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"70 11","pages":"8087-8106"},"PeriodicalIF":2.2000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10583905/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be
$O(\varepsilon ^{2})$
when using
$N \lesssim \log (1/\varepsilon)$
many JKO steps (N Residual Blocks in the flow) where
$\varepsilon $
is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-
$\mathcal {W}_{2}$
mixed error guarantees. The non-asymptotic convergence rate of the JKO-type
$\mathcal {W}_{2}$
-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest. The analysis framework can extend to other first-order Wasserstein optimization schemes applied to flow-based generative models.
期刊介绍:
The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.