{"title":"时间统计在情境依赖强化学习中经验迁移中的作用","authors":"Oussama H. Hamid","doi":"10.1109/HIS.2014.7086184","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) is an algorithmic theory for learning by experience optimal action control. Two widely discussed problems within this field are the temporal credit assignment problem and the transfer of experience. The temporal credit assignment problem postulates that deciding whether an action is good or bad may not be done upon right away because of delayed rewards. The problem of transferring experience investigates the question of how experience can be generalized and transferred from a familiar context, where it was acquired, to an unfamiliar context, where it may, nevertheless, prove helpful. We propose a controller for modelling such flexibility in a context-dependent reinforcement learning paradigm. The devised controller combines two alternatives of perfect learner algorithms. In the first alternative, rewards are predicted by individual objects presented in a temporal sequence. In the second alternative, rewards are predicted on the basis of successive pairs of objects. Simulations run on both deterministic and random temporal sequences show that only in case of deterministic sequences, a previously acquired context could be retrieved. This suggests a role of temporal sequence information in the generalization and transfer of experience.","PeriodicalId":161103,"journal":{"name":"2014 14th International Conference on Hybrid Intelligent Systems","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"The role of temporal statistics in the transfer of experience in context-dependent reinforcement learning\",\"authors\":\"Oussama H. Hamid\",\"doi\":\"10.1109/HIS.2014.7086184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) is an algorithmic theory for learning by experience optimal action control. Two widely discussed problems within this field are the temporal credit assignment problem and the transfer of experience. The temporal credit assignment problem postulates that deciding whether an action is good or bad may not be done upon right away because of delayed rewards. The problem of transferring experience investigates the question of how experience can be generalized and transferred from a familiar context, where it was acquired, to an unfamiliar context, where it may, nevertheless, prove helpful. We propose a controller for modelling such flexibility in a context-dependent reinforcement learning paradigm. The devised controller combines two alternatives of perfect learner algorithms. In the first alternative, rewards are predicted by individual objects presented in a temporal sequence. In the second alternative, rewards are predicted on the basis of successive pairs of objects. Simulations run on both deterministic and random temporal sequences show that only in case of deterministic sequences, a previously acquired context could be retrieved. This suggests a role of temporal sequence information in the generalization and transfer of experience.\",\"PeriodicalId\":161103,\"journal\":{\"name\":\"2014 14th International Conference on Hybrid Intelligent Systems\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 14th International Conference on Hybrid Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HIS.2014.7086184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 14th International Conference on Hybrid Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIS.2014.7086184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The role of temporal statistics in the transfer of experience in context-dependent reinforcement learning
Reinforcement learning (RL) is an algorithmic theory for learning by experience optimal action control. Two widely discussed problems within this field are the temporal credit assignment problem and the transfer of experience. The temporal credit assignment problem postulates that deciding whether an action is good or bad may not be done upon right away because of delayed rewards. The problem of transferring experience investigates the question of how experience can be generalized and transferred from a familiar context, where it was acquired, to an unfamiliar context, where it may, nevertheless, prove helpful. We propose a controller for modelling such flexibility in a context-dependent reinforcement learning paradigm. The devised controller combines two alternatives of perfect learner algorithms. In the first alternative, rewards are predicted by individual objects presented in a temporal sequence. In the second alternative, rewards are predicted on the basis of successive pairs of objects. Simulations run on both deterministic and random temporal sequences show that only in case of deterministic sequences, a previously acquired context could be retrieved. This suggests a role of temporal sequence information in the generalization and transfer of experience.