Ørjan Strand, Didrik Spanne Reilstad, Zhenying Wu, Bruno C. da Silva, J. Tørresen, K. Ellefsen
{"title":"雷达:反应性和深思性的适应性推理——学会何时快速思考,何时慢速思考","authors":"Ørjan Strand, Didrik Spanne Reilstad, Zhenying Wu, Bruno C. da Silva, J. Tørresen, K. Ellefsen","doi":"10.1109/ICDL53763.2022.9962202","DOIUrl":null,"url":null,"abstract":"When designing and deploying Reinforcement Learning (RL) algorithms, one typically selects a single value for the discount rate, which results in an agent that will always be equally reactive or deliberative. However, similarly to humans, RL agents can benefit from adapting their planning horizon to the current context. To enable this, we propose a novel algorithm: RADAR: Reactive and Deliberate Adaptive Reasoning. RADAR enables an agent to learn to adaptively choose a level of deliberation and reactivity according to the state it is in, given that there are cases where one mode of operation is better than the other. Through experiments in a grid world, we verify that the RADAR agent has the capability to adapt its reasoning modality to the current context. In addition, we observe that the RADAR agent exhibits different preferences regarding its thinking modes when a penalty for mental effort is included in its mathematical formulation.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"RADAR: Reactive and Deliberative Adaptive Reasoning - Learning When to Think Fast and When to Think Slow\",\"authors\":\"Ørjan Strand, Didrik Spanne Reilstad, Zhenying Wu, Bruno C. da Silva, J. Tørresen, K. Ellefsen\",\"doi\":\"10.1109/ICDL53763.2022.9962202\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When designing and deploying Reinforcement Learning (RL) algorithms, one typically selects a single value for the discount rate, which results in an agent that will always be equally reactive or deliberative. However, similarly to humans, RL agents can benefit from adapting their planning horizon to the current context. To enable this, we propose a novel algorithm: RADAR: Reactive and Deliberate Adaptive Reasoning. RADAR enables an agent to learn to adaptively choose a level of deliberation and reactivity according to the state it is in, given that there are cases where one mode of operation is better than the other. Through experiments in a grid world, we verify that the RADAR agent has the capability to adapt its reasoning modality to the current context. In addition, we observe that the RADAR agent exhibits different preferences regarding its thinking modes when a penalty for mental effort is included in its mathematical formulation.\",\"PeriodicalId\":274171,\"journal\":{\"name\":\"2022 IEEE International Conference on Development and Learning (ICDL)\",\"volume\":\"2015 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Development and Learning (ICDL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDL53763.2022.9962202\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Development and Learning (ICDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDL53763.2022.9962202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RADAR: Reactive and Deliberative Adaptive Reasoning - Learning When to Think Fast and When to Think Slow
When designing and deploying Reinforcement Learning (RL) algorithms, one typically selects a single value for the discount rate, which results in an agent that will always be equally reactive or deliberative. However, similarly to humans, RL agents can benefit from adapting their planning horizon to the current context. To enable this, we propose a novel algorithm: RADAR: Reactive and Deliberate Adaptive Reasoning. RADAR enables an agent to learn to adaptively choose a level of deliberation and reactivity according to the state it is in, given that there are cases where one mode of operation is better than the other. Through experiments in a grid world, we verify that the RADAR agent has the capability to adapt its reasoning modality to the current context. In addition, we observe that the RADAR agent exhibits different preferences regarding its thinking modes when a penalty for mental effort is included in its mathematical formulation.