深度学习,随机梯度下降和扩散图

Carmina Fjellström, Kaj Nyström
{"title":"深度学习,随机梯度下降和扩散图","authors":"Carmina Fjellström,&nbsp;Kaj Nyström","doi":"10.1016/j.jcmds.2022.100054","DOIUrl":null,"url":null,"abstract":"<div><p>Stochastic gradient descent (SGD) is widely used in deep learning due to its computational efficiency, but a complete understanding of why SGD performs so well remains a major challenge. It has been observed empirically that most eigenvalues of the Hessian of the loss functions on the loss landscape of over-parametrized deep neural networks are close to zero, while only a small number of eigenvalues are large. Zero eigenvalues indicate zero diffusion along the corresponding directions. This indicates that the process of minima selection mainly happens in the relatively low-dimensional subspace corresponding to the top eigenvalues of the Hessian. Although the parameter space is very high-dimensional, these findings seems to indicate that the SGD dynamics may mainly live on a low-dimensional manifold. In this paper, we pursue a truly data driven approach to the problem of getting a potentially deeper understanding of the high-dimensional parameter surface, and in particular, of the landscape traced out by SGD by analyzing the data generated through SGD, or any other optimizer for that matter, in order to possibly discover (local) low-dimensional representations of the optimization landscape. As our vehicle for the exploration, we use diffusion maps introduced by R. Coifman and coauthors.</p></div>","PeriodicalId":100768,"journal":{"name":"Journal of Computational Mathematics and Data Science","volume":"4 ","pages":"Article 100054"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772415822000177/pdfft?md5=38c0dff05f24faf5b0990bd6aa9fd984&pid=1-s2.0-S2772415822000177-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Deep learning, stochastic gradient descent and diffusion maps\",\"authors\":\"Carmina Fjellström,&nbsp;Kaj Nyström\",\"doi\":\"10.1016/j.jcmds.2022.100054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Stochastic gradient descent (SGD) is widely used in deep learning due to its computational efficiency, but a complete understanding of why SGD performs so well remains a major challenge. It has been observed empirically that most eigenvalues of the Hessian of the loss functions on the loss landscape of over-parametrized deep neural networks are close to zero, while only a small number of eigenvalues are large. Zero eigenvalues indicate zero diffusion along the corresponding directions. This indicates that the process of minima selection mainly happens in the relatively low-dimensional subspace corresponding to the top eigenvalues of the Hessian. Although the parameter space is very high-dimensional, these findings seems to indicate that the SGD dynamics may mainly live on a low-dimensional manifold. In this paper, we pursue a truly data driven approach to the problem of getting a potentially deeper understanding of the high-dimensional parameter surface, and in particular, of the landscape traced out by SGD by analyzing the data generated through SGD, or any other optimizer for that matter, in order to possibly discover (local) low-dimensional representations of the optimization landscape. As our vehicle for the exploration, we use diffusion maps introduced by R. Coifman and coauthors.</p></div>\",\"PeriodicalId\":100768,\"journal\":{\"name\":\"Journal of Computational Mathematics and Data Science\",\"volume\":\"4 \",\"pages\":\"Article 100054\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772415822000177/pdfft?md5=38c0dff05f24faf5b0990bd6aa9fd984&pid=1-s2.0-S2772415822000177-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Mathematics and Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772415822000177\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Mathematics and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772415822000177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随机梯度下降(SGD)由于其计算效率而被广泛应用于深度学习,但完全理解SGD为何表现如此出色仍然是一个主要挑战。经验观察到,在过参数化深度神经网络的损失图上,损失函数的Hessian特征值大部分接近于零,只有少数特征值较大。零特征值表示沿相应方向的零扩散。这表明最小值选择过程主要发生在Hessian的顶特征值所对应的相对低维子空间中。虽然参数空间是非常高维的,但这些发现似乎表明SGD动力学可能主要存在于低维流形上。在本文中,我们追求一种真正的数据驱动方法来解决这个问题,即通过分析SGD或任何其他优化器生成的数据,对高维参数曲面,特别是SGD跟踪的景观进行潜在的更深入的理解,以便可能发现优化景观的(局部)低维表示。作为我们探索的工具,我们使用了R. Coifman及其合作者介绍的扩散图。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deep learning, stochastic gradient descent and diffusion maps

Stochastic gradient descent (SGD) is widely used in deep learning due to its computational efficiency, but a complete understanding of why SGD performs so well remains a major challenge. It has been observed empirically that most eigenvalues of the Hessian of the loss functions on the loss landscape of over-parametrized deep neural networks are close to zero, while only a small number of eigenvalues are large. Zero eigenvalues indicate zero diffusion along the corresponding directions. This indicates that the process of minima selection mainly happens in the relatively low-dimensional subspace corresponding to the top eigenvalues of the Hessian. Although the parameter space is very high-dimensional, these findings seems to indicate that the SGD dynamics may mainly live on a low-dimensional manifold. In this paper, we pursue a truly data driven approach to the problem of getting a potentially deeper understanding of the high-dimensional parameter surface, and in particular, of the landscape traced out by SGD by analyzing the data generated through SGD, or any other optimizer for that matter, in order to possibly discover (local) low-dimensional representations of the optimization landscape. As our vehicle for the exploration, we use diffusion maps introduced by R. Coifman and coauthors.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.00
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信