像(Var)Pro一样训练:具有可变投影的神经网络的有效训练

IF 2.6 Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science Pub Date : 2020-07-26 DOI:10.1137/20m1359511

Elizabeth Newman, Lars Ruthotto, Joseph L. Hart, B. V. B. Waanders

{"title":"像(Var)Pro一样训练:具有可变投影的神经网络的有效训练","authors":"Elizabeth Newman, Lars Ruthotto, Joseph L. Hart, B. V. B. Waanders","doi":"10.1137/20m1359511","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss-Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to non-quadratic objective functions, most notably, cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In numerical experiments from classification and surrogate modeling, GNvpro not only solves the optimization problem more efficiently but also yields DNNs that generalize better than commonly-used optimization schemes.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"10 2 1","pages":"1041-1066"},"PeriodicalIF":2.6000,"publicationDate":"2020-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection\",\"authors\":\"Elizabeth Newman, Lars Ruthotto, Joseph L. Hart, B. V. B. Waanders\",\"doi\":\"10.1137/20m1359511\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss-Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to non-quadratic objective functions, most notably, cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In numerical experiments from classification and surrogate modeling, GNvpro not only solves the optimization problem more efficiently but also yields DNNs that generalize better than commonly-used optimization schemes.\",\"PeriodicalId\":74797,\"journal\":{\"name\":\"SIAM journal on mathematics of data science\",\"volume\":\"10 2 1\",\"pages\":\"1041-1066\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2020-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIAM journal on mathematics of data science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1137/20m1359511\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM journal on mathematics of data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/20m1359511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 14

摘要

深度神经网络(dnn)已经在各种传统机器学习任务中取得了最先进的性能，例如语音识别、图像分类和分割。深度神经网络有效地近似高维函数的能力也推动了它们在科学应用中的应用，例如，求解偏微分方程(PDE)和生成代理模型。在本文中，我们考虑在上述许多应用中出现的dnn的监督训练。我们关注的中心问题是优化给定深度神经网络的权重，使其准确地近似于观察到的输入和目标数据之间的关系。由于大量的权重、非凸性、数据稀疏性和超参数的非平凡选择，为这个优化问题设计有效的解决方案非常具有挑战性。为了更有效地解决优化问题，我们提出使用变量投影(VarPro)，这是一种最初设计用于可分离非线性最小二乘问题的方法。我们的主要贡献是高斯-牛顿VarPro方法(GNvpro)，它将VarPro思想的范围扩展到非二次目标函数，最值得注意的是分类中出现的交叉熵损失函数。这些扩展使GNvpro适用于所有涉及DNN的训练问题，DNN的最后一层是仿射映射，这在许多最先进的体系结构中很常见。在分类和代理建模的数值实验中，GNvpro不仅更有效地解决了优化问题，而且生成的dnn比常用的优化方案具有更好的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection

Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss-Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to non-quadratic objective functions, most notably, cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In numerical experiments from classification and surrogate modeling, GNvpro not only solves the optimization problem more efficiently but also yields DNNs that generalize better than commonly-used optimization schemes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SIAM journal on mathematics of data science

自引率

0.00%

发文量