随机梯度下降结合二阶信息的神经网络训练

Proceedings of the 2018 1st International Conference on Mathematics and Statistics Pub Date : 2018-07-15 DOI:10.1145/3274250.3274262

Minyu Chen

{"title":"随机梯度下降结合二阶信息的神经网络训练","authors":"Minyu Chen","doi":"10.1145/3274250.3274262","DOIUrl":null,"url":null,"abstract":"Deep learning is received special attention in the last decade following the increasing popularity of artificial intelligence. A successful deep learning application highly depends on an effective training neural network method. Currently, the first-order methods, e.g. stochastic gradient descent method may be the most widely-used method due to its simplicity and generally good performance. However, the first methods possess varied weakness, like lower convergence rate and easily stalking around saddle points for the nonconvex neural network problem. The second-order method, on the other hand, can address these issues by utilizing second derivative information, but the high computational cost of computing second-derivative information limits its usage. Based on these motivations, we design a new training schema that combine the advantages of first and second methods, meanwhile eliminate their disadvantages. To demonstrate its effectiveness, we test the new method on dataset, cifar-10. The results show the new approach performs as our desired.","PeriodicalId":410500,"journal":{"name":"Proceedings of the 2018 1st International Conference on Mathematics and Statistics","volume":"188 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network\",\"authors\":\"Minyu Chen\",\"doi\":\"10.1145/3274250.3274262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning is received special attention in the last decade following the increasing popularity of artificial intelligence. A successful deep learning application highly depends on an effective training neural network method. Currently, the first-order methods, e.g. stochastic gradient descent method may be the most widely-used method due to its simplicity and generally good performance. However, the first methods possess varied weakness, like lower convergence rate and easily stalking around saddle points for the nonconvex neural network problem. The second-order method, on the other hand, can address these issues by utilizing second derivative information, but the high computational cost of computing second-derivative information limits its usage. Based on these motivations, we design a new training schema that combine the advantages of first and second methods, meanwhile eliminate their disadvantages. To demonstrate its effectiveness, we test the new method on dataset, cifar-10. The results show the new approach performs as our desired.\",\"PeriodicalId\":410500,\"journal\":{\"name\":\"Proceedings of the 2018 1st International Conference on Mathematics and Statistics\",\"volume\":\"188 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 1st International Conference on Mathematics and Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3274250.3274262\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 1st International Conference on Mathematics and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3274250.3274262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着人工智能的日益普及，深度学习在过去十年中受到了特别的关注。一个成功的深度学习应用高度依赖于一个有效的训练神经网络方法。目前，一阶方法，如随机梯度下降法，由于其简单和一般较好的性能，可能是最广泛使用的方法。然而，对于非凸神经网络问题，第一种方法存在着收敛速度较慢、容易在鞍点附近跟踪等缺点。另一方面，二阶方法可以利用二阶导数信息来解决这些问题，但计算二阶导数信息的高计算成本限制了它的使用。基于这些动机，我们设计了一种新的训练模式，结合了第一种方法和第二种方法的优点，同时消除了它们的缺点。为了证明其有效性，我们在数据集cifar-10上测试了新方法。结果表明，新方法达到了预期的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network

Deep learning is received special attention in the last decade following the increasing popularity of artificial intelligence. A successful deep learning application highly depends on an effective training neural network method. Currently, the first-order methods, e.g. stochastic gradient descent method may be the most widely-used method due to its simplicity and generally good performance. However, the first methods possess varied weakness, like lower convergence rate and easily stalking around saddle points for the nonconvex neural network problem. The second-order method, on the other hand, can address these issues by utilizing second derivative information, but the high computational cost of computing second-derivative information limits its usage. Based on these motivations, we design a new training schema that combine the advantages of first and second methods, meanwhile eliminate their disadvantages. To demonstrate its effectiveness, we test the new method on dataset, cifar-10. The results show the new approach performs as our desired.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2018 1st International Conference on Mathematics and Statistics

自引率

0.00%

发文量