Reinforcement Learning in an Environment Synthetically Augmented with Digital Pheromones

Adv. Artif. Intell. Pub Date : 2014-03-13 DOI:10.1155/2014/932485

Salvador E. Barbosa, Mikel D. Petty

{"title":"Reinforcement Learning in an Environment Synthetically Augmented with Digital Pheromones","authors":"Salvador E. Barbosa, Mikel D. Petty","doi":"10.1155/2014/932485","DOIUrl":null,"url":null,"abstract":"Reinforcement learning requires information about states, actions, and outcomes as the basis for learning. For many applications, it can be difficult to construct a representative model of the environment, either due to lack of required information or because of that the model's state space may become too large to allow a solution in a reasonable amount of time, using the experience of prior actions. An environment consisting solely of the occurrence or nonoccurrence of specific events attributable to a human actor may appear to lack the necessary structure for the positioning of responding agents in time and space using reinforcement learning. Digital pheromones can be used to synthetically augment such an environment with event sequence information to create a more persistent and measurable imprint on the environment that supports reinforcement learning. We implemented this method and combined it with the ability of agents to learn from actions not taken, a concept known as fictive learning. This approach was tested against the historical sequence of Somali maritime pirate attacks from 2005 to mid-2012, enabling a set of autonomous agents representing naval vessels to successfully respond to an average of 333 of the 899 pirate attacks, outperforming the historical record of 139 successes.","PeriodicalId":7253,"journal":{"name":"Adv. Artif. Intell.","volume":"17 1","pages":"932485:1-932485:23"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2014/932485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Reinforcement learning requires information about states, actions, and outcomes as the basis for learning. For many applications, it can be difficult to construct a representative model of the environment, either due to lack of required information or because of that the model's state space may become too large to allow a solution in a reasonable amount of time, using the experience of prior actions. An environment consisting solely of the occurrence or nonoccurrence of specific events attributable to a human actor may appear to lack the necessary structure for the positioning of responding agents in time and space using reinforcement learning. Digital pheromones can be used to synthetically augment such an environment with event sequence information to create a more persistent and measurable imprint on the environment that supports reinforcement learning. We implemented this method and combined it with the ability of agents to learn from actions not taken, a concept known as fictive learning. This approach was tested against the historical sequence of Somali maritime pirate attacks from 2005 to mid-2012, enabling a set of autonomous agents representing naval vessels to successfully respond to an average of 333 of the 899 pirate attacks, outperforming the historical record of 139 successes.

查看原文本刊更多论文

数字信息素综合增强环境中的强化学习

强化学习需要关于状态、行为和结果的信息作为学习的基础。对于许多应用程序，构建环境的代表性模型可能很困难，原因可能是缺乏所需的信息，或者是因为模型的状态空间可能变得太大，无法使用先前操作的经验在合理的时间内找到解决方案。仅由可归因于人类行为者的特定事件的发生或不发生组成的环境可能缺乏使用强化学习在时间和空间上定位响应代理的必要结构。数字信息素可以用事件序列信息来综合增强这样的环境，从而在支持强化学习的环境上创建更持久和可测量的印记。我们实现了这种方法，并将其与智能体从未采取的行动中学习的能力相结合，这是一个被称为有效学习的概念。该方法在2005年至2012年中期的索马里海盗袭击历史序列中进行了测试，使一组代表海军舰艇的自主代理成功应对了899次海盗袭击中的333次，超过了139次成功的历史记录。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Adv. Artif. Intell.

自引率

0.00%

发文量