深度学习中噪声的良性插值

Q3 Social Sciences
Marthinus W. Theunissen, Marelie Hattingh Davel, E. Barnard
{"title":"深度学习中噪声的良性插值","authors":"Marthinus W. Theunissen, Marelie Hattingh Davel, E. Barnard","doi":"10.18489/sacj.v32i2.833","DOIUrl":null,"url":null,"abstract":"The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance tradeoff in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally, we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Benign interpolation of noise in deep learning\",\"authors\":\"Marthinus W. Theunissen, Marelie Hattingh Davel, E. Barnard\",\"doi\":\"10.18489/sacj.v32i2.833\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance tradeoff in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally, we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.\",\"PeriodicalId\":55859,\"journal\":{\"name\":\"South African Computer Journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"South African Computer Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18489/sacj.v32i2.833\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"South African Computer Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18489/sacj.v32i2.833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 4

摘要

机器学习中对泛化的理解处于不断变化的状态,部分原因是深度学习模型能够对有噪声的训练数据进行插值,并且仍然对样本外数据进行适当的处理,从而与长期以来关于学习中偏差-方差权衡的直觉相矛盾。我们通过在一个相对简单的框架内讨论神经网络训练的局部属性来扩展现有的相关工作。我们描述了如何在所提出的框架内补偿各种类型的噪声,以便允许深度学习模型在插值虚假函数描述符的情况下进行泛化。从经验上讲,我们通过涉及多参数多层感知器和受控训练数据噪声的实验来支持我们的假设。主要见解是,深度学习模型针对模块化训练数据进行了优化,函数空间中的不同区域专门用于拟合不同类型的样本信息。此外,我们还表明,模型倾向于首先拟合未损坏的样本。基于这一发现,我们提出了一个猜想来解释一个观察到的分时代双下降现象的例子。我们的研究结果表明,需要修改模型能力的概念,以考虑训练数据在子单元中的分布方式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Benign interpolation of noise in deep learning
The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance tradeoff in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally, we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
South African Computer Journal
South African Computer Journal Social Sciences-Education
CiteScore
1.30
自引率
0.00%
发文量
10
审稿时长
24 weeks
期刊介绍: The South African Computer Journal is specialist ICT academic journal, accredited by the South African Department of Higher Education and Training SACJ publishes research articles, viewpoints and communications in English in Computer Science and Information Systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信