Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization

Mahdi Haghifam, Borja Rodr'iguez-G'alvez, R. Thobaben, M. Skoglund, Daniel M. Roy, G. Dziugaite
{"title":"Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization","authors":"Mahdi Haghifam, Borja Rodr'iguez-G'alvez, R. Thobaben, M. Skoglund, Daniel M. Roy, G. Dziugaite","doi":"10.48550/arXiv.2212.13556","DOIUrl":null,"url":null,"abstract":"To date, no\"information-theoretic\"frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy\"surrogate\"algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithmic Learning Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.13556","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

To date, no"information-theoretic"frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy"surrogate"algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
随机凸优化中梯度下降方法的信息论泛化界的局限性
到目前为止,还没有“信息论”的框架来推理泛化误差,以建立在随机凸优化设置梯度下降的极大极小率。在这项工作中,我们考虑了通过几个现有的信息理论框架来建立这种速率的前景:输入-输出互信息边界,条件互信息边界和变体,PAC-Bayes边界,以及最近的条件变体。我们证明了这些边界都不能建立极大极小率。然后,我们考虑在研究梯度方法中使用的一种常用策略,即最终迭代被高斯噪声破坏,产生有噪声的“代理”算法。我们通过对这些替代物的分析证明了极大极小率是不能成立的。我们的研究结果表明,使用信息理论技术分析梯度下降需要新的思路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信