{"title":"随机梯度下降结合二阶信息的神经网络训练","authors":"Minyu Chen","doi":"10.1145/3274250.3274262","DOIUrl":null,"url":null,"abstract":"Deep learning is received special attention in the last decade following the increasing popularity of artificial intelligence. A successful deep learning application highly depends on an effective training neural network method. Currently, the first-order methods, e.g. stochastic gradient descent method may be the most widely-used method due to its simplicity and generally good performance. However, the first methods possess varied weakness, like lower convergence rate and easily stalking around saddle points for the nonconvex neural network problem. The second-order method, on the other hand, can address these issues by utilizing second derivative information, but the high computational cost of computing second-derivative information limits its usage. Based on these motivations, we design a new training schema that combine the advantages of first and second methods, meanwhile eliminate their disadvantages. To demonstrate its effectiveness, we test the new method on dataset, cifar-10. The results show the new approach performs as our desired.","PeriodicalId":410500,"journal":{"name":"Proceedings of the 2018 1st International Conference on Mathematics and Statistics","volume":"188 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network\",\"authors\":\"Minyu Chen\",\"doi\":\"10.1145/3274250.3274262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning is received special attention in the last decade following the increasing popularity of artificial intelligence. A successful deep learning application highly depends on an effective training neural network method. Currently, the first-order methods, e.g. stochastic gradient descent method may be the most widely-used method due to its simplicity and generally good performance. However, the first methods possess varied weakness, like lower convergence rate and easily stalking around saddle points for the nonconvex neural network problem. The second-order method, on the other hand, can address these issues by utilizing second derivative information, but the high computational cost of computing second-derivative information limits its usage. Based on these motivations, we design a new training schema that combine the advantages of first and second methods, meanwhile eliminate their disadvantages. To demonstrate its effectiveness, we test the new method on dataset, cifar-10. The results show the new approach performs as our desired.\",\"PeriodicalId\":410500,\"journal\":{\"name\":\"Proceedings of the 2018 1st International Conference on Mathematics and Statistics\",\"volume\":\"188 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 1st International Conference on Mathematics and Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3274250.3274262\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 1st International Conference on Mathematics and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3274250.3274262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network
Deep learning is received special attention in the last decade following the increasing popularity of artificial intelligence. A successful deep learning application highly depends on an effective training neural network method. Currently, the first-order methods, e.g. stochastic gradient descent method may be the most widely-used method due to its simplicity and generally good performance. However, the first methods possess varied weakness, like lower convergence rate and easily stalking around saddle points for the nonconvex neural network problem. The second-order method, on the other hand, can address these issues by utilizing second derivative information, but the high computational cost of computing second-derivative information limits its usage. Based on these motivations, we design a new training schema that combine the advantages of first and second methods, meanwhile eliminate their disadvantages. To demonstrate its effectiveness, we test the new method on dataset, cifar-10. The results show the new approach performs as our desired.