雷达:反应性和深思性的适应性推理——学会何时快速思考，何时慢速思考

2022 IEEE International Conference on Development and Learning (ICDL) Pub Date : 2022-09-12 DOI:10.1109/ICDL53763.2022.9962202

Ørjan Strand, Didrik Spanne Reilstad, Zhenying Wu, Bruno C. da Silva, J. Tørresen, K. Ellefsen

{"title":"雷达:反应性和深思性的适应性推理——学会何时快速思考，何时慢速思考","authors":"Ørjan Strand, Didrik Spanne Reilstad, Zhenying Wu, Bruno C. da Silva, J. Tørresen, K. Ellefsen","doi":"10.1109/ICDL53763.2022.9962202","DOIUrl":null,"url":null,"abstract":"When designing and deploying Reinforcement Learning (RL) algorithms, one typically selects a single value for the discount rate, which results in an agent that will always be equally reactive or deliberative. However, similarly to humans, RL agents can benefit from adapting their planning horizon to the current context. To enable this, we propose a novel algorithm: RADAR: Reactive and Deliberate Adaptive Reasoning. RADAR enables an agent to learn to adaptively choose a level of deliberation and reactivity according to the state it is in, given that there are cases where one mode of operation is better than the other. Through experiments in a grid world, we verify that the RADAR agent has the capability to adapt its reasoning modality to the current context. In addition, we observe that the RADAR agent exhibits different preferences regarding its thinking modes when a penalty for mental effort is included in its mathematical formulation.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"RADAR: Reactive and Deliberative Adaptive Reasoning - Learning When to Think Fast and When to Think Slow\",\"authors\":\"Ørjan Strand, Didrik Spanne Reilstad, Zhenying Wu, Bruno C. da Silva, J. Tørresen, K. Ellefsen\",\"doi\":\"10.1109/ICDL53763.2022.9962202\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When designing and deploying Reinforcement Learning (RL) algorithms, one typically selects a single value for the discount rate, which results in an agent that will always be equally reactive or deliberative. However, similarly to humans, RL agents can benefit from adapting their planning horizon to the current context. To enable this, we propose a novel algorithm: RADAR: Reactive and Deliberate Adaptive Reasoning. RADAR enables an agent to learn to adaptively choose a level of deliberation and reactivity according to the state it is in, given that there are cases where one mode of operation is better than the other. Through experiments in a grid world, we verify that the RADAR agent has the capability to adapt its reasoning modality to the current context. In addition, we observe that the RADAR agent exhibits different preferences regarding its thinking modes when a penalty for mental effort is included in its mathematical formulation.\",\"PeriodicalId\":274171,\"journal\":{\"name\":\"2022 IEEE International Conference on Development and Learning (ICDL)\",\"volume\":\"2015 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Development and Learning (ICDL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDL53763.2022.9962202\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Development and Learning (ICDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDL53763.2022.9962202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在设计和部署强化学习(RL)算法时，通常会为贴现率选择单个值，这将导致代理始终具有同样的反应性或审慎性。然而，与人类类似，强化学习代理可以通过调整他们的规划范围来适应当前环境而受益。为了实现这一点，我们提出了一种新的算法:RADAR:反应性和有意识的自适应推理。在某些情况下，一种操作模式比另一种操作模式更好，雷达使智能体能够根据其所处的状态，学习自适应地选择考虑和反应的水平。通过在网格世界中的实验，我们验证了RADAR代理具有适应当前上下文的推理模式的能力。此外，我们观察到RADAR代理在其数学公式中包含对脑力劳动的惩罚时，对其思维模式表现出不同的偏好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

RADAR: Reactive and Deliberative Adaptive Reasoning - Learning When to Think Fast and When to Think Slow

When designing and deploying Reinforcement Learning (RL) algorithms, one typically selects a single value for the discount rate, which results in an agent that will always be equally reactive or deliberative. However, similarly to humans, RL agents can benefit from adapting their planning horizon to the current context. To enable this, we propose a novel algorithm: RADAR: Reactive and Deliberate Adaptive Reasoning. RADAR enables an agent to learn to adaptively choose a level of deliberation and reactivity according to the state it is in, given that there are cases where one mode of operation is better than the other. Through experiments in a grid world, we verify that the RADAR agent has the capability to adapt its reasoning modality to the current context. In addition, we observe that the RADAR agent exhibits different preferences regarding its thinking modes when a penalty for mental effort is included in its mathematical formulation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Development and Learning (ICDL)

自引率

0.00%

发文量