Peter Binev, Andrea Bonito, Ronald DeVore, Guergana Petrova
{"title":"优化学习","authors":"Peter Binev, Andrea Bonito, Ronald DeVore, Guergana Petrova","doi":"10.1007/s10092-023-00564-y","DOIUrl":null,"url":null,"abstract":"<p>This paper studies the problem of learning an unknown function <i>f</i> from given data about <i>f</i>. The learning problem is to give an approximation <span>\\({\\hat{f}}\\)</span> to <i>f</i> that predicts the values of <i>f</i> away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about <i>f</i> (known as a model class assumption), (ii) how we measure the accuracy of how well <span>\\({\\hat{f}}\\)</span> predicts <i>f</i>, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal <span>\\({\\hat{f}}\\)</span> can be found by solving a certain finite-dimensional over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation <span>\\({\\hat{f}}\\)</span> of the function <i>f</i> from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of <i>f</i>. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.</p>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal learning\",\"authors\":\"Peter Binev, Andrea Bonito, Ronald DeVore, Guergana Petrova\",\"doi\":\"10.1007/s10092-023-00564-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This paper studies the problem of learning an unknown function <i>f</i> from given data about <i>f</i>. The learning problem is to give an approximation <span>\\\\({\\\\hat{f}}\\\\)</span> to <i>f</i> that predicts the values of <i>f</i> away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about <i>f</i> (known as a model class assumption), (ii) how we measure the accuracy of how well <span>\\\\({\\\\hat{f}}\\\\)</span> predicts <i>f</i>, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal <span>\\\\({\\\\hat{f}}\\\\)</span> can be found by solving a certain finite-dimensional over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation <span>\\\\({\\\\hat{f}}\\\\)</span> of the function <i>f</i> from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of <i>f</i>. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.</p>\",\"PeriodicalId\":1,\"journal\":{\"name\":\"Accounts of Chemical Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":16.4000,\"publicationDate\":\"2024-02-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accounts of Chemical Research\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s10092-023-00564-y\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10092-023-00564-y","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
本文研究从给定的关于 f 的数据中学习未知函数 f 的问题。学习问题是给出 f 的近似值 \({\hat{f}}\),该近似值可以预测 f 在数据之外的值。这个学习问题有多种设置,取决于:(i) 我们有哪些关于 f 的额外信息(称为模型类假设);(ii) 我们如何衡量 \({\hat{f}}\) 预测 f 的准确性;(iii) 我们对数据和数据站点的了解;(iv) 数据观测是否受到噪声污染。在有模型类假设的情况下,可能的最佳性能(可能的最小恢复误差)的数学描述是已知的。本文表明,在标准模型类假设条件下,通过求解某个带有惩罚项的有限维超参数优化问题,可以找到一个接近最优的 \({\hhat{f}}\)。这里的近似最优指的是误差以最优误差乘以一个固定常数为界。这就解释了现代机器学习中常用的超参数化的优势。本文的主要结果证明,使用适当的损失函数进行过参数化学习,可以得到函数 f 的近似值 \({\hat{f}}\)。本文还给出了定量约束,说明需要采用多少过度参数化以及如何调整惩罚比例才能保证近似最优地恢复 f。
This paper studies the problem of learning an unknown function f from given data about f. The learning problem is to give an approximation \({\hat{f}}\) to f that predicts the values of f away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about f (known as a model class assumption), (ii) how we measure the accuracy of how well \({\hat{f}}\) predicts f, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal \({\hat{f}}\) can be found by solving a certain finite-dimensional over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation \({\hat{f}}\) of the function f from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of f. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.
期刊介绍:
Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance.
Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.