Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection

IF 1.9 Q1 MATHEMATICS, APPLIED
Elizabeth Newman, Lars Ruthotto, Joseph L. Hart, B. V. B. Waanders
{"title":"Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection","authors":"Elizabeth Newman, Lars Ruthotto, Joseph L. Hart, B. V. B. Waanders","doi":"10.1137/20m1359511","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss-Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to non-quadratic objective functions, most notably, cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In numerical experiments from classification and surrogate modeling, GNvpro not only solves the optimization problem more efficiently but also yields DNNs that generalize better than commonly-used optimization schemes.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"10 2 1","pages":"1041-1066"},"PeriodicalIF":1.9000,"publicationDate":"2020-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM journal on mathematics of data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/20m1359511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 14

Abstract

Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss-Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to non-quadratic objective functions, most notably, cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In numerical experiments from classification and surrogate modeling, GNvpro not only solves the optimization problem more efficiently but also yields DNNs that generalize better than commonly-used optimization schemes.
像(Var)Pro一样训练:具有可变投影的神经网络的有效训练
深度神经网络(dnn)已经在各种传统机器学习任务中取得了最先进的性能,例如语音识别、图像分类和分割。深度神经网络有效地近似高维函数的能力也推动了它们在科学应用中的应用,例如,求解偏微分方程(PDE)和生成代理模型。在本文中,我们考虑在上述许多应用中出现的dnn的监督训练。我们关注的中心问题是优化给定深度神经网络的权重,使其准确地近似于观察到的输入和目标数据之间的关系。由于大量的权重、非凸性、数据稀疏性和超参数的非平凡选择,为这个优化问题设计有效的解决方案非常具有挑战性。为了更有效地解决优化问题,我们提出使用变量投影(VarPro),这是一种最初设计用于可分离非线性最小二乘问题的方法。我们的主要贡献是高斯-牛顿VarPro方法(GNvpro),它将VarPro思想的范围扩展到非二次目标函数,最值得注意的是分类中出现的交叉熵损失函数。这些扩展使GNvpro适用于所有涉及DNN的训练问题,DNN的最后一层是仿射映射,这在许多最先进的体系结构中很常见。在分类和代理建模的数值实验中,GNvpro不仅更有效地解决了优化问题,而且生成的dnn比常用的优化方案具有更好的泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信