{"title":"一种基于强化学习的神经网络梯度优化算法","authors":"Lei Lv, Ziming Chen, Zhenyu Lu","doi":"10.1109/SPAC49953.2019.237884","DOIUrl":null,"url":null,"abstract":"Searching appropriate step size and hyperparameter is the key to getting a robust convergence for gradient descent optimization algorithm. This study comes up with a novel gradient descent strategy based on reinforce learning, in which the gradient information of each time step is expressed as the state information of markov decision process in iterative optimization of neural network. We design a variable-view distance planner with a markov decision process as its recursive core for neural-network gradient descent. It combines the advantages of model-free learning and model-based learning, and fully utilizes the state transition information of the optimized neural-network objective function at each step. Experimental results show that the proposed method not only retains the merits of the model-free asymptotic optimal strategy but also enhances the utilization rate of samples compared with manually designed optimization algorithms.","PeriodicalId":410003,"journal":{"name":"2019 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A novel neural-network gradient optimization algorithm based on reinforcement learning\",\"authors\":\"Lei Lv, Ziming Chen, Zhenyu Lu\",\"doi\":\"10.1109/SPAC49953.2019.237884\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Searching appropriate step size and hyperparameter is the key to getting a robust convergence for gradient descent optimization algorithm. This study comes up with a novel gradient descent strategy based on reinforce learning, in which the gradient information of each time step is expressed as the state information of markov decision process in iterative optimization of neural network. We design a variable-view distance planner with a markov decision process as its recursive core for neural-network gradient descent. It combines the advantages of model-free learning and model-based learning, and fully utilizes the state transition information of the optimized neural-network objective function at each step. Experimental results show that the proposed method not only retains the merits of the model-free asymptotic optimal strategy but also enhances the utilization rate of samples compared with manually designed optimization algorithms.\",\"PeriodicalId\":410003,\"journal\":{\"name\":\"2019 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPAC49953.2019.237884\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPAC49953.2019.237884","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A novel neural-network gradient optimization algorithm based on reinforcement learning
Searching appropriate step size and hyperparameter is the key to getting a robust convergence for gradient descent optimization algorithm. This study comes up with a novel gradient descent strategy based on reinforce learning, in which the gradient information of each time step is expressed as the state information of markov decision process in iterative optimization of neural network. We design a variable-view distance planner with a markov decision process as its recursive core for neural-network gradient descent. It combines the advantages of model-free learning and model-based learning, and fully utilizes the state transition information of the optimized neural-network objective function at each step. Experimental results show that the proposed method not only retains the merits of the model-free asymptotic optimal strategy but also enhances the utilization rate of samples compared with manually designed optimization algorithms.