在Atari库中实现人类级别的安全强化学习

Jurnal Sisfokom Pub Date : 2023-11-06 DOI:10.32736/sisfokom.v12i3.1739

Afriyadi Afriyadi, Wiranto Herry Utomo

{"title":"在Atari库中实现人类级别的安全强化学习","authors":"Afriyadi Afriyadi, Wiranto Herry Utomo","doi":"10.32736/sisfokom.v12i3.1739","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) is a powerful tool for training agents to perform complex tasks. However, from time-to-time RL agents often learn to behave in unsafe or unintended ways. This is especially true during the exploration phase, when the agent is trying to learn about its environment. This research acquires safe exploration methods from the field of robotics and evaluates their effectiveness compared to other algorithms that are commonly used in complex videogame environments without safe exploration. We also propose a method for hand-crafting catastrophic states, which are states that are known to be unsafe for the agent to visit. Our results show that our method and our hand-crafted safety constraints outperform state-of-the-art algorithms on relatively certain iterations. This means that our method is able to learn to behave safely while still achieving good performance. These results have implications for the future development of human-level safe learning with combination of model-based RL using complex videogame environments. By developing safe exploration methods, we can help to ensure that RL agents can be used in a variety of real-world applications, such as self-driving cars and robotics.","PeriodicalId":34309,"journal":{"name":"Jurnal Sisfokom","volume":"178 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Human-Level Safe Reinforcement Learning in Atari Library\",\"authors\":\"Afriyadi Afriyadi, Wiranto Herry Utomo\",\"doi\":\"10.32736/sisfokom.v12i3.1739\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) is a powerful tool for training agents to perform complex tasks. However, from time-to-time RL agents often learn to behave in unsafe or unintended ways. This is especially true during the exploration phase, when the agent is trying to learn about its environment. This research acquires safe exploration methods from the field of robotics and evaluates their effectiveness compared to other algorithms that are commonly used in complex videogame environments without safe exploration. We also propose a method for hand-crafting catastrophic states, which are states that are known to be unsafe for the agent to visit. Our results show that our method and our hand-crafted safety constraints outperform state-of-the-art algorithms on relatively certain iterations. This means that our method is able to learn to behave safely while still achieving good performance. These results have implications for the future development of human-level safe learning with combination of model-based RL using complex videogame environments. By developing safe exploration methods, we can help to ensure that RL agents can be used in a variety of real-world applications, such as self-driving cars and robotics.\",\"PeriodicalId\":34309,\"journal\":{\"name\":\"Jurnal Sisfokom\",\"volume\":\"178 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Sisfokom\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32736/sisfokom.v12i3.1739\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Sisfokom","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32736/sisfokom.v12i3.1739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

强化学习(RL)是训练智能体执行复杂任务的有力工具。然而，RL代理经常会以不安全或意想不到的方式学习行为。在探索阶段尤其如此，当智能体试图了解其环境时。本研究从机器人领域获得了安全探索方法，并将其与其他算法进行了比较，这些算法通常用于复杂的视频游戏环境中，没有安全探索。我们还提出了一种手工制作灾难性状态的方法，这些状态是已知的代理无法访问的不安全状态。我们的结果表明，在相对确定的迭代中，我们的方法和手工制作的安全约束优于最先进的算法。这意味着我们的方法能够在学习安全行为的同时获得良好的性能。这些结果对人类水平的安全学习的未来发展具有启示意义，结合基于模型的强化学习使用复杂的视频游戏环境。通过开发安全的探索方法，我们可以帮助确保强化学习代理可以用于各种现实世界的应用，例如自动驾驶汽车和机器人。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Human-Level Safe Reinforcement Learning in Atari Library

Reinforcement learning (RL) is a powerful tool for training agents to perform complex tasks. However, from time-to-time RL agents often learn to behave in unsafe or unintended ways. This is especially true during the exploration phase, when the agent is trying to learn about its environment. This research acquires safe exploration methods from the field of robotics and evaluates their effectiveness compared to other algorithms that are commonly used in complex videogame environments without safe exploration. We also propose a method for hand-crafting catastrophic states, which are states that are known to be unsafe for the agent to visit. Our results show that our method and our hand-crafted safety constraints outperform state-of-the-art algorithms on relatively certain iterations. This means that our method is able to learn to behave safely while still achieving good performance. These results have implications for the future development of human-level safe learning with combination of model-based RL using complex videogame environments. By developing safe exploration methods, we can help to ensure that RL agents can be used in a variety of real-world applications, such as self-driving cars and robotics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Jurnal Sisfokom

自引率

0.00%

发文量

审稿时长

8 weeks