随机梯度下降结合二阶信息的神经网络训练

Minyu Chen
{"title":"随机梯度下降结合二阶信息的神经网络训练","authors":"Minyu Chen","doi":"10.1145/3274250.3274262","DOIUrl":null,"url":null,"abstract":"Deep learning is received special attention in the last decade following the increasing popularity of artificial intelligence. A successful deep learning application highly depends on an effective training neural network method. Currently, the first-order methods, e.g. stochastic gradient descent method may be the most widely-used method due to its simplicity and generally good performance. However, the first methods possess varied weakness, like lower convergence rate and easily stalking around saddle points for the nonconvex neural network problem. The second-order method, on the other hand, can address these issues by utilizing second derivative information, but the high computational cost of computing second-derivative information limits its usage. Based on these motivations, we design a new training schema that combine the advantages of first and second methods, meanwhile eliminate their disadvantages. To demonstrate its effectiveness, we test the new method on dataset, cifar-10. The results show the new approach performs as our desired.","PeriodicalId":410500,"journal":{"name":"Proceedings of the 2018 1st International Conference on Mathematics and Statistics","volume":"188 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network\",\"authors\":\"Minyu Chen\",\"doi\":\"10.1145/3274250.3274262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning is received special attention in the last decade following the increasing popularity of artificial intelligence. A successful deep learning application highly depends on an effective training neural network method. Currently, the first-order methods, e.g. stochastic gradient descent method may be the most widely-used method due to its simplicity and generally good performance. However, the first methods possess varied weakness, like lower convergence rate and easily stalking around saddle points for the nonconvex neural network problem. The second-order method, on the other hand, can address these issues by utilizing second derivative information, but the high computational cost of computing second-derivative information limits its usage. Based on these motivations, we design a new training schema that combine the advantages of first and second methods, meanwhile eliminate their disadvantages. To demonstrate its effectiveness, we test the new method on dataset, cifar-10. The results show the new approach performs as our desired.\",\"PeriodicalId\":410500,\"journal\":{\"name\":\"Proceedings of the 2018 1st International Conference on Mathematics and Statistics\",\"volume\":\"188 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 1st International Conference on Mathematics and Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3274250.3274262\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 1st International Conference on Mathematics and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3274250.3274262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着人工智能的日益普及,深度学习在过去十年中受到了特别的关注。一个成功的深度学习应用高度依赖于一个有效的训练神经网络方法。目前,一阶方法,如随机梯度下降法,由于其简单和一般较好的性能,可能是最广泛使用的方法。然而,对于非凸神经网络问题,第一种方法存在着收敛速度较慢、容易在鞍点附近跟踪等缺点。另一方面,二阶方法可以利用二阶导数信息来解决这些问题,但计算二阶导数信息的高计算成本限制了它的使用。基于这些动机,我们设计了一种新的训练模式,结合了第一种方法和第二种方法的优点,同时消除了它们的缺点。为了证明其有效性,我们在数据集cifar-10上测试了新方法。结果表明,新方法达到了预期的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network
Deep learning is received special attention in the last decade following the increasing popularity of artificial intelligence. A successful deep learning application highly depends on an effective training neural network method. Currently, the first-order methods, e.g. stochastic gradient descent method may be the most widely-used method due to its simplicity and generally good performance. However, the first methods possess varied weakness, like lower convergence rate and easily stalking around saddle points for the nonconvex neural network problem. The second-order method, on the other hand, can address these issues by utilizing second derivative information, but the high computational cost of computing second-derivative information limits its usage. Based on these motivations, we design a new training schema that combine the advantages of first and second methods, meanwhile eliminate their disadvantages. To demonstrate its effectiveness, we test the new method on dataset, cifar-10. The results show the new approach performs as our desired.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信