Neural network approximation

IF 16.3 1区数学 Q1 MATHEMATICS

Acta Numerica Pub Date : 2020-12-28 DOI:10.1017/S0962492921000052

R. DeVore, B. Hanin, G. Petrova

{"title":"Neural network approximation","authors":"R. DeVore, B. Hanin, G. Petrova","doi":"10.1017/S0962492921000052","DOIUrl":null,"url":null,"abstract":"Neural networks (NNs) are the method of choice for building learning algorithms. They are now being investigated for other numerical tasks such as solving high-dimensional partial differential equations. Their popularity stems from their empirical success on several challenging learning problems (computer chess/Go, autonomous navigation, face recognition). However, most scholars agree that a convincing theoretical explanation for this success is still lacking. Since these applications revolve around approximating an unknown function from data observations, part of the answer must involve the ability of NNs to produce accurate approximations. This article surveys the known approximation properties of the outputs of NNs with the aim of uncovering the properties that are not present in the more traditional methods of approximation used in numerical analysis, such as approximations using polynomials, wavelets, rational functions and splines. Comparisons are made with traditional approximation methods from the viewpoint of rate distortion, i.e. error versus the number of parameters used to create the approximant. Another major component in the analysis of numerical approximation is the computational time needed to construct the approximation, and this in turn is intimately connected with the stability of the approximation algorithm. So the stability of numerical approximation using NNs is a large part of the analysis put forward. The survey, for the most part, is concerned with NNs using the popular ReLU activation function. In this case the outputs of the NNs are piecewise linear functions on rather complicated partitions of the domain of f into cells that are convex polytopes. When the architecture of the NN is fixed and the parameters are allowed to vary, the set of output functions of the NN is a parametrized nonlinear manifold. It is shown that this manifold has certain space-filling properties leading to an increased ability to approximate (better rate distortion) but at the expense of numerical stability. The space filling creates the challenge to the numerical method of finding best or good parameter choices when trying to approximate.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"30 1","pages":"327 - 444"},"PeriodicalIF":16.3000,"publicationDate":"2020-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"111","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Numerica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1017/S0962492921000052","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 111

Abstract

Neural networks (NNs) are the method of choice for building learning algorithms. They are now being investigated for other numerical tasks such as solving high-dimensional partial differential equations. Their popularity stems from their empirical success on several challenging learning problems (computer chess/Go, autonomous navigation, face recognition). However, most scholars agree that a convincing theoretical explanation for this success is still lacking. Since these applications revolve around approximating an unknown function from data observations, part of the answer must involve the ability of NNs to produce accurate approximations. This article surveys the known approximation properties of the outputs of NNs with the aim of uncovering the properties that are not present in the more traditional methods of approximation used in numerical analysis, such as approximations using polynomials, wavelets, rational functions and splines. Comparisons are made with traditional approximation methods from the viewpoint of rate distortion, i.e. error versus the number of parameters used to create the approximant. Another major component in the analysis of numerical approximation is the computational time needed to construct the approximation, and this in turn is intimately connected with the stability of the approximation algorithm. So the stability of numerical approximation using NNs is a large part of the analysis put forward. The survey, for the most part, is concerned with NNs using the popular ReLU activation function. In this case the outputs of the NNs are piecewise linear functions on rather complicated partitions of the domain of f into cells that are convex polytopes. When the architecture of the NN is fixed and the parameters are allowed to vary, the set of output functions of the NN is a parametrized nonlinear manifold. It is shown that this manifold has certain space-filling properties leading to an increased ability to approximate (better rate distortion) but at the expense of numerical stability. The space filling creates the challenge to the numerical method of finding best or good parameter choices when trying to approximate.

查看原文本刊更多论文

神经网络近似

神经网络是构建学习算法的首选方法。它们现在正被研究用于其他数值任务，如求解高维偏微分方程。他们的受欢迎源于他们在几个具有挑战性的学习问题（计算机象棋/围棋、自主导航、人脸识别）上的经验成功。然而，大多数学者一致认为，对这一成功仍然缺乏令人信服的理论解释。由于这些应用程序围绕着从数据观测中近似未知函数，因此部分答案必须涉及神经网络产生准确近似的能力。本文调查了神经网络输出的已知近似性质，目的是揭示数值分析中使用的更传统的近似方法中不存在的性质，例如使用多项式、小波、有理函数和样条的近似。从速率失真的角度与传统近似方法进行了比较，即误差与用于创建近似的参数数量的关系。数值近似分析的另一个主要组成部分是构造近似所需的计算时间，而这反过来又与近似算法的稳定性密切相关。因此，使用神经网络进行数值逼近的稳定性是分析的重要组成部分。该调查在很大程度上涉及使用流行的ReLU激活功能的NN。在这种情况下，NN的输出是在f的域的相当复杂的划分上的分段线性函数，这些划分为凸多面体的单元。当神经网络的结构是固定的并且允许参数变化时，神经网络的输出函数集是一个参数化的非线性流形。结果表明，该流形具有一定的空间填充特性，从而提高了近似能力（更好的速率失真），但以牺牲数值稳定性为代价。空间填充对试图近似时寻找最佳或良好参数选择的数值方法提出了挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Acta Numerica MATHEMATICS-

CiteScore

26.00

自引率

0.70%

发文量

期刊介绍： Acta Numerica stands as the preeminent mathematics journal, ranking highest in both Impact Factor and MCQ metrics. This annual journal features a collection of review articles that showcase survey papers authored by prominent researchers in numerical analysis, scientific computing, and computational mathematics. These papers deliver comprehensive overviews of recent advances, offering state-of-the-art techniques and analyses. Encompassing the entirety of numerical analysis, the articles are crafted in an accessible style, catering to researchers at all levels and serving as valuable teaching aids for advanced instruction. The broad subject areas covered include computational methods in linear algebra, optimization, ordinary and partial differential equations, approximation theory, stochastic analysis, nonlinear dynamical systems, as well as the application of computational techniques in science and engineering. Acta Numerica also delves into the mathematical theory underpinning numerical methods, making it a versatile and authoritative resource in the field of mathematics.