Training error, generalization error and learning curves in neural learning

Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems Pub Date : 1995-11-20 DOI:10.1109/ANNES.1995.499426

S. Amari

{"title":"Training error, generalization error and learning curves in neural learning","authors":"S. Amari","doi":"10.1109/ANNES.1995.499426","DOIUrl":null,"url":null,"abstract":"A neural network is trained by using a set of available examples to minimize the training error such that the network parameters fit the examples well. However, it is desired to minimize the generalization error to which no direct access is possible. There are discrepancies between the training error and the generalization error due to the statistical fluctuation of examples. The article focuses on this problem from the statistical point of view. When the number of training examples is large, we have a universal asymptotic evaluation on the discrepancies of the two errors. This can be used for model selection based on the information criterion. When the number of training examples is small, their discrepancies are big, causing a serious overfitting or overtraining problem. We analyze this phenomenon by using a simple model. It is surprising that the generalization error even increases as the number of examples increases in a certain range. This shows the adequacy of the minimum training error learning method. We evaluate various means of overcoming the overtraining such as cross validated early stopping of training, introduction of the regularization terms, model selection and others.","PeriodicalId":123427,"journal":{"name":"Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ANNES.1995.499426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

A neural network is trained by using a set of available examples to minimize the training error such that the network parameters fit the examples well. However, it is desired to minimize the generalization error to which no direct access is possible. There are discrepancies between the training error and the generalization error due to the statistical fluctuation of examples. The article focuses on this problem from the statistical point of view. When the number of training examples is large, we have a universal asymptotic evaluation on the discrepancies of the two errors. This can be used for model selection based on the information criterion. When the number of training examples is small, their discrepancies are big, causing a serious overfitting or overtraining problem. We analyze this phenomenon by using a simple model. It is surprising that the generalization error even increases as the number of examples increases in a certain range. This shows the adequacy of the minimum training error learning method. We evaluate various means of overcoming the overtraining such as cross validated early stopping of training, introduction of the regularization terms, model selection and others.

查看原文本刊更多论文

神经学习中的训练误差、泛化误差和学习曲线

神经网络的训练是通过使用一组可用的样本来最小化训练误差，使网络参数与样本很好地拟合。然而，我们希望最小化不能直接访问的泛化误差。由于样本的统计波动，训练误差与泛化误差之间存在差异。本文从统计学的角度对这一问题进行了探讨。当训练样本数量较大时，我们对两个误差的差异有一个普遍的渐近评价。这可以用于基于信息标准的模型选择。当训练样例数量较少时，它们之间的差异很大，导致严重的过拟合或过度训练问题。我们用一个简单的模型来分析这一现象。令人惊讶的是，在一定范围内，泛化误差甚至随着样本数量的增加而增加。这说明了最小训练误差学习方法的充分性。我们评估了克服过度训练的各种方法，如交叉验证提前停止训练、引入正则化项、模型选择等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems

自引率

0.00%

发文量