Epoch-incremental强化学习算法

International Journal of Applied Mathematics and Computer Sciences Pub Date : 2013-09-01 DOI:10.2478/amcs-2013-0047

R. Zajdel

{"title":"Epoch-incremental强化学习算法","authors":"R. Zajdel","doi":"10.2478/amcs-2013-0047","DOIUrl":null,"url":null,"abstract":"Abstract In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.","PeriodicalId":253470,"journal":{"name":"International Journal of Applied Mathematics and Computer Sciences","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Epoch-incremental reinforcement learning algorithms\",\"authors\":\"R. Zajdel\",\"doi\":\"10.2478/amcs-2013-0047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.\",\"PeriodicalId\":253470,\"journal\":{\"name\":\"International Journal of Applied Mathematics and Computer Sciences\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Applied Mathematics and Computer Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/amcs-2013-0047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Mathematics and Computer Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/amcs-2013-0047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

摘要本文提出了一类新的epoch-incremental强化学习算法。在增量模式下，执行基本TD(0)或TD(λ)算法并创建环境模型。在历元模式下，基于环境模型，计算过去活动状态到终端状态的距离。利用这些距离和增强终端状态信号来改进代理策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Epoch-incremental reinforcement learning algorithms

Abstract In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Applied Mathematics and Computer Sciences

自引率

0.00%

发文量