{"title":"Information Acquisition Driven by Reinforcement in Non-Deterministic Environments","authors":"N. Bynagari, Ruhul Amin","doi":"10.18034/ajtp.v6i3.569","DOIUrl":null,"url":null,"abstract":"What is the fastest way for an agent living in a non-deterministic Markov environment (NME) to learn about its statistical properties? The answer is to create \"optimal\" experiment sequences by carrying out action sequences that maximize expected knowledge gain. This idea is put into practice by integrating information theory and reinforcement learning techniques. Experiments demonstrate that the resulting method, reinforcement-driven information acquisition (RDIA), is substantially faster than standard random exploration for exploring particular NMEs. Exploration was studied apart from exploitation and we evaluated the performance of different reinforcement-driven information acquisition variations to that of traditional random exploration. \n ","PeriodicalId":433827,"journal":{"name":"American Journal of Trade and Policy","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Trade and Policy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18034/ajtp.v6i3.569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
What is the fastest way for an agent living in a non-deterministic Markov environment (NME) to learn about its statistical properties? The answer is to create "optimal" experiment sequences by carrying out action sequences that maximize expected knowledge gain. This idea is put into practice by integrating information theory and reinforcement learning techniques. Experiments demonstrate that the resulting method, reinforcement-driven information acquisition (RDIA), is substantially faster than standard random exploration for exploring particular NMEs. Exploration was studied apart from exploitation and we evaluated the performance of different reinforcement-driven information acquisition variations to that of traditional random exploration.