{"title":"Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation","authors":"M. Belkin","doi":"10.1017/S0962492921000039","DOIUrl":null,"url":null,"abstract":"In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation and its sibling over-parametrization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parametrization enables interpolation and provides flexibility to select a suitable interpolating model. As we will see, just as a physical prism separates colours mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern machine learning. This article is written in the belief and hope that clearer understanding of these issues will bring us a step closer towards a general theory of deep learning and machine learning.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"30 1","pages":"203 - 248"},"PeriodicalIF":16.3000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"116","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Numerica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1017/S0962492921000039","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 116
Abstract
In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation and its sibling over-parametrization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parametrization enables interpolation and provides flexibility to select a suitable interpolating model. As we will see, just as a physical prism separates colours mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern machine learning. This article is written in the belief and hope that clearer understanding of these issues will bring us a step closer towards a general theory of deep learning and machine learning.
期刊介绍:
Acta Numerica stands as the preeminent mathematics journal, ranking highest in both Impact Factor and MCQ metrics. This annual journal features a collection of review articles that showcase survey papers authored by prominent researchers in numerical analysis, scientific computing, and computational mathematics. These papers deliver comprehensive overviews of recent advances, offering state-of-the-art techniques and analyses.
Encompassing the entirety of numerical analysis, the articles are crafted in an accessible style, catering to researchers at all levels and serving as valuable teaching aids for advanced instruction. The broad subject areas covered include computational methods in linear algebra, optimization, ordinary and partial differential equations, approximation theory, stochastic analysis, nonlinear dynamical systems, as well as the application of computational techniques in science and engineering. Acta Numerica also delves into the mathematical theory underpinning numerical methods, making it a versatile and authoritative resource in the field of mathematics.