{"title":"用递归神经网络强化学习从延迟奖励中发现模式意义","authors":"K. Shibata, Hiroki Utsunomiya","doi":"10.1109/IJCNN.2011.6033394","DOIUrl":null,"url":null,"abstract":"In this paper, by the combination of reinforcement learning and a recurrent neural network, the authors try to provide an explanation for the question: why humans can discover the meaning of patterns and acquire appropriate behaviors based on it. Using a system with a real movable camera, it is demonstrated in a simple task in which the system discovers pattern meaning from delayed rewards by reinforcement learning with a recurrent neural network. When the system moves its camera to the direction of an arrow presented on a display, it can get a reward. One kind of arrow is chosen randomly among four kinds at each episode, and the input of the network is 1,560 visual signals from the camera. After learning, the system could move its camera to the arrow direction. It was found that some hidden neurons represented the arrow direction not depending on the presented arrow pattern and kept it after the arrow disappeared from the image, even though no arrow was seen when it was rewarded and no one told the system that the arrow direction is important to get the reward. Generalization to some new arrow patterns and associative memory function also can be seen to some extent.","PeriodicalId":415833,"journal":{"name":"The 2011 International Joint Conference on Neural Networks","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Discovery of pattern meaning from delayed rewards by reinforcement learning with a recurrent neural network\",\"authors\":\"K. Shibata, Hiroki Utsunomiya\",\"doi\":\"10.1109/IJCNN.2011.6033394\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, by the combination of reinforcement learning and a recurrent neural network, the authors try to provide an explanation for the question: why humans can discover the meaning of patterns and acquire appropriate behaviors based on it. Using a system with a real movable camera, it is demonstrated in a simple task in which the system discovers pattern meaning from delayed rewards by reinforcement learning with a recurrent neural network. When the system moves its camera to the direction of an arrow presented on a display, it can get a reward. One kind of arrow is chosen randomly among four kinds at each episode, and the input of the network is 1,560 visual signals from the camera. After learning, the system could move its camera to the arrow direction. It was found that some hidden neurons represented the arrow direction not depending on the presented arrow pattern and kept it after the arrow disappeared from the image, even though no arrow was seen when it was rewarded and no one told the system that the arrow direction is important to get the reward. Generalization to some new arrow patterns and associative memory function also can be seen to some extent.\",\"PeriodicalId\":415833,\"journal\":{\"name\":\"The 2011 International Joint Conference on Neural Networks\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2011 International Joint Conference on Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.2011.6033394\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2011 International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2011.6033394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Discovery of pattern meaning from delayed rewards by reinforcement learning with a recurrent neural network
In this paper, by the combination of reinforcement learning and a recurrent neural network, the authors try to provide an explanation for the question: why humans can discover the meaning of patterns and acquire appropriate behaviors based on it. Using a system with a real movable camera, it is demonstrated in a simple task in which the system discovers pattern meaning from delayed rewards by reinforcement learning with a recurrent neural network. When the system moves its camera to the direction of an arrow presented on a display, it can get a reward. One kind of arrow is chosen randomly among four kinds at each episode, and the input of the network is 1,560 visual signals from the camera. After learning, the system could move its camera to the arrow direction. It was found that some hidden neurons represented the arrow direction not depending on the presented arrow pattern and kept it after the arrow disappeared from the image, even though no arrow was seen when it was rewarded and no one told the system that the arrow direction is important to get the reward. Generalization to some new arrow patterns and associative memory function also can be seen to some extent.