Optimal Regularization for a Data Source

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Foundations of Computational Mathematics Pub Date : 2025-01-27 DOI:10.1007/s10208-025-09693-y

Oscar Leong, Eliza O’ Reilly, Yong Sheng Soh, Venkat Chandrasekaran

{"title":"Optimal Regularization for a Data Source","authors":"Oscar Leong, Eliza O’ Reilly, Yong Sheng Soh, Venkat Chandrasekaran","doi":"10.1007/s10208-025-09693-y","DOIUrl":null,"url":null,"abstract":"<p>In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment criteria that enforce data fidelity with a regularizer that promotes desired structural properties in the solution. The choice of a suitable regularizer is typically driven by a combination of prior domain information and computational considerations. Convex regularizers are attractive computationally but they are limited in the types of structure they can promote. On the other hand, nonconvex regularizers are more flexible in the forms of structure they can promote and they have showcased strong empirical performance in some applications, but they come with the computational challenge of solving the associated optimization problems. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what is the optimal regularizer for data drawn from the distribution? What properties of a data source govern whether the optimal regularizer is convex? We address these questions for the class of regularizers specified by functionals that are continuous, positively homogeneous, and positive away from the origin. We say that a regularizer is optimal for a data distribution if the Gibbs density with energy given by the regularizer maximizes the population likelihood (or equivalently, minimizes cross-entropy loss) over all regularizer-induced Gibbs densities. As the regularizers we consider are in one-to-one correspondence with star bodies, we leverage dual Brunn-Minkowski theory to show that a radial function derived from a data distribution is akin to a “computational sufficient statistic” as it is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization. Using tools such as <span>\\(\\Gamma \\)</span>-convergence from variational analysis, we show that our results are robust in the sense that the optimal regularizers for a sample drawn from a distribution converge to their population counterparts as the sample size grows large. Finally, we give generalization guarantees for various families of star bodies that recover previous results for polyhedral regularizers (i.e., dictionary learning) and lead to new ones for a variety of classes of star bodies. </p>","PeriodicalId":55151,"journal":{"name":"Foundations of Computational Mathematics","volume":"48 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations of Computational Mathematics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10208-025-09693-y","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment criteria that enforce data fidelity with a regularizer that promotes desired structural properties in the solution. The choice of a suitable regularizer is typically driven by a combination of prior domain information and computational considerations. Convex regularizers are attractive computationally but they are limited in the types of structure they can promote. On the other hand, nonconvex regularizers are more flexible in the forms of structure they can promote and they have showcased strong empirical performance in some applications, but they come with the computational challenge of solving the associated optimization problems. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what is the optimal regularizer for data drawn from the distribution? What properties of a data source govern whether the optimal regularizer is convex? We address these questions for the class of regularizers specified by functionals that are continuous, positively homogeneous, and positive away from the origin. We say that a regularizer is optimal for a data distribution if the Gibbs density with energy given by the regularizer maximizes the population likelihood (or equivalently, minimizes cross-entropy loss) over all regularizer-induced Gibbs densities. As the regularizers we consider are in one-to-one correspondence with star bodies, we leverage dual Brunn-Minkowski theory to show that a radial function derived from a data distribution is akin to a “computational sufficient statistic” as it is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization. Using tools such as \(\Gamma \)-convergence from variational analysis, we show that our results are robust in the sense that the optimal regularizers for a sample drawn from a distribution converge to their population counterparts as the sample size grows large. Finally, we give generalization guarantees for various families of star bodies that recover previous results for polyhedral regularizers (i.e., dictionary learning) and lead to new ones for a variety of classes of star bodies.

查看原文本刊更多论文

数据源的最佳正则化

在反问题和统计估计的基于优化的方法中，通常会使用正则化器来增强标准，以增强数据保真度，从而提高解决方案中所需的结构属性。一个合适的正则化器的选择通常是由先验领域信息和计算考虑的组合驱动的。凸正则化在计算上很有吸引力，但它们在可以提升的结构类型上受到限制。另一方面，非凸正则化在结构形式上更灵活，它们可以促进，并且在一些应用中展示了强大的经验性能，但是它们带来了解决相关优化问题的计算挑战。在本文中，我们通过研究以下问题来寻求对凸正则化的能力和局限性的系统理解：给定一个分布，从该分布中提取的数据的最佳正则化器是什么？数据源的哪些属性决定了最优正则化器是否为凸？我们针对连续的、正齐次的、离原点正的泛函所指定的一类正则子来解决这些问题。我们说正则化器对于数据分布是最优的，如果由正则化器给出能量的吉布斯密度在所有正则化器诱导的吉布斯密度上使总体可能性最大化（或等效地，使交叉熵损失最小化）。由于我们考虑的正则化器与恒星体是一对一对应的，我们利用双重布伦-闵可夫斯基理论来表明，从数据分布中导出的径向函数类似于“计算充分统计”，因为它是识别最佳正则化器和评估数据源对凸正则化的适应性的关键量。使用变分分析的\(\Gamma \) -收敛等工具，我们证明了我们的结果是鲁棒的，因为从分布中提取的样本的最佳正则化器随着样本量的增加而收敛到与其对应的总体。最后，我们给出了各种星体族的泛化保证，这些保证恢复了多面体正则化器（即字典学习）的先前结果，并导致了各种星体类的新结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Foundations of Computational Mathematics 数学-计算机：理论方法

CiteScore

6.90

自引率

3.30%

发文量

审稿时长

>12 weeks

期刊介绍： Foundations of Computational Mathematics (FoCM) will publish research and survey papers of the highest quality which further the understanding of the connections between mathematics and computation. The journal aims to promote the exploration of all fundamental issues underlying the creative tension among mathematics, computer science and application areas unencumbered by any external criteria such as the pressure for applications. The journal will thus serve an increasingly important and applicable area of mathematics. The journal hopes to further the understanding of the deep relationships between mathematical theory: analysis, topology, geometry and algebra, and the computational processes as they are evolving in tandem with the modern computer. With its distinguished editorial board selecting papers of the highest quality and interest from the international community, FoCM hopes to influence both mathematics and computation. Relevance to applications will not constitute a requirement for the publication of articles. The journal does not accept code for review however authors who have code/data related to the submission should include a weblink to the repository where the data/code is stored.