深度学习中噪声的良性插值

Q3 Social Sciences

South African Computer Journal Pub Date : 2020-12-08 DOI:10.18489/sacj.v32i2.833

Marthinus W. Theunissen, Marelie Hattingh Davel, E. Barnard

{"title":"深度学习中噪声的良性插值","authors":"Marthinus W. Theunissen, Marelie Hattingh Davel, E. Barnard","doi":"10.18489/sacj.v32i2.833","DOIUrl":null,"url":null,"abstract":"The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance tradeoff in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally, we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Benign interpolation of noise in deep learning\",\"authors\":\"Marthinus W. Theunissen, Marelie Hattingh Davel, E. Barnard\",\"doi\":\"10.18489/sacj.v32i2.833\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance tradeoff in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally, we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.\",\"PeriodicalId\":55859,\"journal\":{\"name\":\"South African Computer Journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"South African Computer Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18489/sacj.v32i2.833\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"South African Computer Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18489/sacj.v32i2.833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 4

摘要

机器学习中对泛化的理解处于不断变化的状态，部分原因是深度学习模型能够对有噪声的训练数据进行插值，并且仍然对样本外数据进行适当的处理，从而与长期以来关于学习中偏差-方差权衡的直觉相矛盾。我们通过在一个相对简单的框架内讨论神经网络训练的局部属性来扩展现有的相关工作。我们描述了如何在所提出的框架内补偿各种类型的噪声，以便允许深度学习模型在插值虚假函数描述符的情况下进行泛化。从经验上讲，我们通过涉及多参数多层感知器和受控训练数据噪声的实验来支持我们的假设。主要见解是，深度学习模型针对模块化训练数据进行了优化，函数空间中的不同区域专门用于拟合不同类型的样本信息。此外，我们还表明，模型倾向于首先拟合未损坏的样本。基于这一发现，我们提出了一个猜想来解释一个观察到的分时代双下降现象的例子。我们的研究结果表明，需要修改模型能力的概念，以考虑训练数据在子单元中的分布方式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Benign interpolation of noise in deep learning

The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance tradeoff in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally, we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

South African Computer Journal Social Sciences-Education

CiteScore

1.30

自引率

0.00%

发文量

审稿时长

24 weeks

期刊介绍： The South African Computer Journal is specialist ICT academic journal, accredited by the South African Department of Higher Education and Training SACJ publishes research articles, viewpoints and communications in English in Computer Science and Information Systems.