Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization

International Conference on Algorithmic Learning Theory Pub Date : 2022-12-27 DOI:10.48550/arXiv.2212.13556

Mahdi Haghifam, Borja Rodr'iguez-G'alvez, R. Thobaben, M. Skoglund, Daniel M. Roy, G. Dziugaite

引用次数: 8

Abstract

To date, no"information-theoretic"frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy"surrogate"algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.

查看原文本刊更多论文

随机凸优化中梯度下降方法的信息论泛化界的局限性

到目前为止，还没有“信息论”的框架来推理泛化误差，以建立在随机凸优化设置梯度下降的极大极小率。在这项工作中，我们考虑了通过几个现有的信息理论框架来建立这种速率的前景:输入-输出互信息边界，条件互信息边界和变体，PAC-Bayes边界，以及最近的条件变体。我们证明了这些边界都不能建立极大极小率。然后，我们考虑在研究梯度方法中使用的一种常用策略，即最终迭代被高斯噪声破坏，产生有噪声的“代理”算法。我们通过对这些替代物的分析证明了极大极小率是不能成立的。我们的研究结果表明，使用信息理论技术分析梯度下降需要新的思路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Algorithmic Learning Theory

自引率

0.00%

发文量