泛化误差的快速信息论界

IF 2.9 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Information Theory Pub Date : 2025-06-20 DOI:10.1109/TIT.2025.3581715

Xuetong Wu;Jonathan H. Manton;Uwe Aickelin;Jingge Zhu

{"title":"泛化误差的快速信息论界","authors":"Xuetong Wu;Jonathan H. Manton;Uwe Aickelin;Jingge Zhu","doi":"10.1109/TIT.2025.3581715","DOIUrl":null,"url":null,"abstract":"The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data. Various information-theoretic bounds on the generalization error have been derived in the literature, where the mutual information between the training data and the hypothesis (the output of the learning algorithm) plays an important role. Focusing on the individual sample mutual information bound by (Bu et al., 2020), which itself is a tightened version of the first bound on the topic by (Russo and Zou, 2016) and (Xu and Raginsky, 2017), this paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size <italic>n</i>. It has been recognized that these bounds are in general not tight, readily verified for the exemplary quadratic Gaussian mean estimation problem, where the individual sample mutual information bound scales as <inline-formula> <tex-math>$O(\\sqrt {1/n})$ </tex-math></inline-formula> while the true generalization error scales as <inline-formula> <tex-math>$O(1/n)$ </tex-math></inline-formula>. The first contribution of this paper is to show that the same bound can in fact be asymptotically tight if an appropriate assumption is made. In particular, we show that the fast rate can be recovered when the assumption is made on the excess risk instead of the loss function, which was usually done in existing literature. A theoretical justification is given for this choice. The second contribution of the paper is a new set of generalization error bounds based on the <inline-formula> <tex-math>$(\\eta, c)$ </tex-math></inline-formula>-central condition, a condition relatively easy to verify and has the property that the mutual information term directly determines the convergence rate of the bound. Several analytical and numerical examples are given to show the effectiveness of these bounds.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 8","pages":"6373-6392"},"PeriodicalIF":2.9000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast Rate Information-Theoretic Bounds on Generalization Errors\",\"authors\":\"Xuetong Wu;Jonathan H. Manton;Uwe Aickelin;Jingge Zhu\",\"doi\":\"10.1109/TIT.2025.3581715\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data. Various information-theoretic bounds on the generalization error have been derived in the literature, where the mutual information between the training data and the hypothesis (the output of the learning algorithm) plays an important role. Focusing on the individual sample mutual information bound by (Bu et al., 2020), which itself is a tightened version of the first bound on the topic by (Russo and Zou, 2016) and (Xu and Raginsky, 2017), this paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size <italic>n</i>. It has been recognized that these bounds are in general not tight, readily verified for the exemplary quadratic Gaussian mean estimation problem, where the individual sample mutual information bound scales as <inline-formula> <tex-math>$O(\\\\sqrt {1/n})$ </tex-math></inline-formula> while the true generalization error scales as <inline-formula> <tex-math>$O(1/n)$ </tex-math></inline-formula>. The first contribution of this paper is to show that the same bound can in fact be asymptotically tight if an appropriate assumption is made. In particular, we show that the fast rate can be recovered when the assumption is made on the excess risk instead of the loss function, which was usually done in existing literature. A theoretical justification is given for this choice. The second contribution of the paper is a new set of generalization error bounds based on the <inline-formula> <tex-math>$(\\\\eta, c)$ </tex-math></inline-formula>-central condition, a condition relatively easy to verify and has the property that the mutual information term directly determines the convergence rate of the bound. Several analytical and numerical examples are given to show the effectiveness of these bounds.\",\"PeriodicalId\":13494,\"journal\":{\"name\":\"IEEE Transactions on Information Theory\",\"volume\":\"71 8\",\"pages\":\"6373-6392\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Theory\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11045700/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11045700/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

学习算法的泛化误差是指学习算法在训练数据上的损失与在未见过的测试数据上的损失之间的差异。文献中已经推导出各种关于泛化误差的信息论边界，其中训练数据和假设之间的互信息（学习算法的输出）起着重要的作用。本文关注由（Bu et al., 2020）约束的个体样本互信息，它本身是（Russo and Zou, 2016）和（Xu and Raginsky, 2017）关于该主题的第一个边界的收紧版本，从它们的收敛速度对样本大小n的依赖来看，研究了这些边界的紧密性。已经认识到，这些边界通常不紧密，很容易对示例性二次高斯平均估计问题进行验证。其中，单个样本互信息界尺度为$O(\sqrt {1/n})$，真实泛化误差尺度为$O(1/n)$。本文的第一个贡献是表明，如果作出适当的假设，同一界实际上可以是渐近紧的。特别是，我们表明，当假设超额风险而不是损失函数时，可以快速恢复速率，这在现有文献中通常是这样做的。为这一选择提供了理论依据。本文的第二个贡献是基于$(\eta, c)$中心条件的一组新的泛化误差边界，该条件相对容易验证，并且具有互信息项直接决定边界收敛速度的性质。给出了几个解析和数值例子来证明这些边界的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fast Rate Information-Theoretic Bounds on Generalization Errors

The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data. Various information-theoretic bounds on the generalization error have been derived in the literature, where the mutual information between the training data and the hypothesis (the output of the learning algorithm) plays an important role. Focusing on the individual sample mutual information bound by (Bu et al., 2020), which itself is a tightened version of the first bound on the topic by (Russo and Zou, 2016) and (Xu and Raginsky, 2017), this paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size n. It has been recognized that these bounds are in general not tight, readily verified for the exemplary quadratic Gaussian mean estimation problem, where the individual sample mutual information bound scales as

$O(\sqrt {1/n})$

while the true generalization error scales as

$O(1/n)$

. The first contribution of this paper is to show that the same bound can in fact be asymptotically tight if an appropriate assumption is made. In particular, we show that the fast rate can be recovered when the assumption is made on the excess risk instead of the loss function, which was usually done in existing literature. A theoretical justification is given for this choice. The second contribution of the paper is a new set of generalization error bounds based on the

$(\eta, c)$

-central condition, a condition relatively easy to verify and has the property that the mutual information term directly determines the convergence rate of the bound. Several analytical and numerical examples are given to show the effectiveness of these bounds.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Information Theory 工程技术-工程：电子与电气

CiteScore

5.70

自引率

20.00%

发文量

514

审稿时长

12 months

期刊介绍： The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.