RADAR: Reactive and Deliberative Adaptive Reasoning - Learning When to Think Fast and When to Think Slow

Ørjan Strand, Didrik Spanne Reilstad, Zhenying Wu, Bruno C. da Silva, J. Tørresen, K. Ellefsen
{"title":"RADAR: Reactive and Deliberative Adaptive Reasoning - Learning When to Think Fast and When to Think Slow","authors":"Ørjan Strand, Didrik Spanne Reilstad, Zhenying Wu, Bruno C. da Silva, J. Tørresen, K. Ellefsen","doi":"10.1109/ICDL53763.2022.9962202","DOIUrl":null,"url":null,"abstract":"When designing and deploying Reinforcement Learning (RL) algorithms, one typically selects a single value for the discount rate, which results in an agent that will always be equally reactive or deliberative. However, similarly to humans, RL agents can benefit from adapting their planning horizon to the current context. To enable this, we propose a novel algorithm: RADAR: Reactive and Deliberate Adaptive Reasoning. RADAR enables an agent to learn to adaptively choose a level of deliberation and reactivity according to the state it is in, given that there are cases where one mode of operation is better than the other. Through experiments in a grid world, we verify that the RADAR agent has the capability to adapt its reasoning modality to the current context. In addition, we observe that the RADAR agent exhibits different preferences regarding its thinking modes when a penalty for mental effort is included in its mathematical formulation.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Development and Learning (ICDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDL53763.2022.9962202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

When designing and deploying Reinforcement Learning (RL) algorithms, one typically selects a single value for the discount rate, which results in an agent that will always be equally reactive or deliberative. However, similarly to humans, RL agents can benefit from adapting their planning horizon to the current context. To enable this, we propose a novel algorithm: RADAR: Reactive and Deliberate Adaptive Reasoning. RADAR enables an agent to learn to adaptively choose a level of deliberation and reactivity according to the state it is in, given that there are cases where one mode of operation is better than the other. Through experiments in a grid world, we verify that the RADAR agent has the capability to adapt its reasoning modality to the current context. In addition, we observe that the RADAR agent exhibits different preferences regarding its thinking modes when a penalty for mental effort is included in its mathematical formulation.
雷达:反应性和深思性的适应性推理——学会何时快速思考,何时慢速思考
在设计和部署强化学习(RL)算法时,通常会为贴现率选择单个值,这将导致代理始终具有同样的反应性或审慎性。然而,与人类类似,强化学习代理可以通过调整他们的规划范围来适应当前环境而受益。为了实现这一点,我们提出了一种新的算法:RADAR:反应性和有意识的自适应推理。在某些情况下,一种操作模式比另一种操作模式更好,雷达使智能体能够根据其所处的状态,学习自适应地选择考虑和反应的水平。通过在网格世界中的实验,我们验证了RADAR代理具有适应当前上下文的推理模式的能力。此外,我们观察到RADAR代理在其数学公式中包含对脑力劳动的惩罚时,对其思维模式表现出不同的偏好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信