Two Steps at a Time---Taking GAN Training in Stride with Tseng's Method

IF 2.6 Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science Pub Date : 2020-06-16 DOI:10.1137/21m1420939

A. Böhm, Michael Sedlmayer, E. R. Csetnek, R. Boț

引用次数: 13

Abstract

Motivated by the training of Generative Adversarial Networks (GANs), we study methods for solving minimax problems with additional nonsmooth regularizers. We do so by employing \emph{monotone operator} theory, in particular the \emph{Forward-Backward-Forward (FBF)} method, which avoids the known issue of limit cycling by correcting each update by a second gradient evaluation. Furthermore, we propose a seemingly new scheme which recycles old gradients to mitigate the additional computational cost. In doing so we rediscover a known method, related to \emph{Optimistic Gradient Descent Ascent (OGDA)}. For both schemes we prove novel convergence rates for convex-concave minimax problems via a unifying approach. The derived error bounds are in terms of the gap function for the ergodic iterates. For the deterministic and the stochastic problem we show a convergence rate of $\mathcal{O}(1/k)$ and $\mathcal{O}(1/\sqrt{k})$, respectively. We complement our theoretical results with empirical improvements in the training of Wasserstein GANs on the CIFAR10 dataset.

查看原文本刊更多论文

一次两步——用曾氏方法进行GAN训练

受生成对抗网络(GANs)训练的启发，我们研究了带有附加非光滑正则器的极大极小问题的求解方法。我们通过使用\emph{单调算子}理论，特别是\emph{前-后-前(FBF)}方法来做到这一点，该方法通过第二次梯度评估来纠正每次更新，从而避免了已知的极限循环问题。此外，我们提出了一个看似新的方案，回收旧的梯度，以减少额外的计算成本。在此过程中，我们重新发现了一种已知的方法，与\emph{乐观梯度下降上升(OGDA)}相关。对于这两种方案，我们通过统一的方法证明了凸凹极小极大问题的新的收敛速率。导出的误差边界是根据遍历迭代的间隙函数。对于确定性问题和随机问题，我们分别给出了$\mathcal{O}(1/k)$和$\mathcal{O}(1/\sqrt{k})$的收敛率。我们通过在CIFAR10数据集上训练Wasserstein gan的经验改进来补充我们的理论结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

SIAM journal on mathematics of data science

自引率

0.00%

发文量