{"title":"在急剧变化的环境中连续动作的改进随机突触强化学习","authors":"Syed Naveed Hussain Shah, Dean Frederick Hougen","doi":"10.1109/IJCNN48605.2020.9207622","DOIUrl":null,"url":null,"abstract":"Reinforcement learning in continuous action spaces requires mechanisms that allow for exploration of infinite possible actions. One challenging issue in such systems is the amount of exploration appropriate during learning. This issue is complicated further in sharply changing dynamic environments. Reinforcement learning in artificial neural networks with multiparameter distributions can address all aspects of these issues. However, which equations are most appropriate for updating these parameters remains an open question. Here we consider possible equations derived from two sources: The classic equations proposed for REINFORCE and modern equations introduced for Stochastic Synapse Reinforcement Learning (SSRL), as well as combinations thereof and variations thereon. Using a set of multidimensional robot inverse kinematics problems, we find that novel combinations of these equations outperform either set of equations alone in terms of both learning rate and consistency.","PeriodicalId":134599,"journal":{"name":"IEEE International Joint Conference on Neural Network","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improved Stochastic Synapse Reinforcement Learning for Continuous Actions in Sharply Changing Environments\",\"authors\":\"Syed Naveed Hussain Shah, Dean Frederick Hougen\",\"doi\":\"10.1109/IJCNN48605.2020.9207622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning in continuous action spaces requires mechanisms that allow for exploration of infinite possible actions. One challenging issue in such systems is the amount of exploration appropriate during learning. This issue is complicated further in sharply changing dynamic environments. Reinforcement learning in artificial neural networks with multiparameter distributions can address all aspects of these issues. However, which equations are most appropriate for updating these parameters remains an open question. Here we consider possible equations derived from two sources: The classic equations proposed for REINFORCE and modern equations introduced for Stochastic Synapse Reinforcement Learning (SSRL), as well as combinations thereof and variations thereon. Using a set of multidimensional robot inverse kinematics problems, we find that novel combinations of these equations outperform either set of equations alone in terms of both learning rate and consistency.\",\"PeriodicalId\":134599,\"journal\":{\"name\":\"IEEE International Joint Conference on Neural Network\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Joint Conference on Neural Network\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN48605.2020.9207622\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Joint Conference on Neural Network","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN48605.2020.9207622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improved Stochastic Synapse Reinforcement Learning for Continuous Actions in Sharply Changing Environments
Reinforcement learning in continuous action spaces requires mechanisms that allow for exploration of infinite possible actions. One challenging issue in such systems is the amount of exploration appropriate during learning. This issue is complicated further in sharply changing dynamic environments. Reinforcement learning in artificial neural networks with multiparameter distributions can address all aspects of these issues. However, which equations are most appropriate for updating these parameters remains an open question. Here we consider possible equations derived from two sources: The classic equations proposed for REINFORCE and modern equations introduced for Stochastic Synapse Reinforcement Learning (SSRL), as well as combinations thereof and variations thereon. Using a set of multidimensional robot inverse kinematics problems, we find that novel combinations of these equations outperform either set of equations alone in terms of both learning rate and consistency.