Note on generalization, regularization and architecture selection in nonlinear learning systems

Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop Pub Date : 1991-09-30 DOI:10.1109/NNSP.1991.239541

J. Moody

引用次数: 157

Abstract

The author proposes a new estimate of generalization performance for nonlinear learning systems called the generalized prediction error (GPE) which is based upon the notion of the effective number of parameters p/sub eff/( lambda ). GPE does not require the use of a test set or computationally intensive cross validation and generalizes previously proposed model selection criteria (such as GCV, FPE, AIC, and PSE) in that it is formulated to include biased, nonlinear models (such as back propagation networks) which may incorporate weight decay or other regularizers. The effective number of parameters p/sub eff/( lambda ) depends upon the amount of bias and smoothness (as determined by the regularization parameter lambda ) in the model, but generally differs from the number of weights p. Construction of an optimal architecture thus requires not just finding the weights w/sub lambda /* which minimize the training function U( lambda , w) but also the lambda which minimizes GPE( lambda ).<>

查看原文本刊更多论文

非线性学习系统的泛化、正则化和结构选择

基于有效参数数p/下标eff/(lambda)的概念，提出了一种新的非线性学习系统泛化性能估计方法，称为广义预测误差(GPE)。GPE不需要使用测试集或计算密集的交叉验证，并推广了先前提出的模型选择标准(如GCV, FPE, AIC和PSE)，因为它的制定包括可能包含权重衰减或其他正则化器的有偏差的非线性模型(如反向传播网络)。参数p/下标eff/(lambda)的有效数量取决于模型中的偏差和平滑程度(由正则化参数lambda决定)，但通常与权重p的数量不同。因此，构建最优架构不仅需要找到使训练函数U(lambda, w)最小化的权重w/下标lambda /*，还需要找到使GPE(lambda)最小化的lambda。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop

自引率

0.00%

发文量