在摩擦任务中对有限理性的人类代理进行强化学习干预。

Eura Nofshin, Siddharth Swaroop, Weiwei Pan, Susan Murphy, Finale Doshi-Velez
{"title":"在摩擦任务中对有限理性的人类代理进行强化学习干预。","authors":"Eura Nofshin, Siddharth Swaroop, Weiwei Pan, Susan Murphy, Finale Doshi-Velez","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Many important behavior changes are <i>frictionful</i>; they require individuals to expend effort over a long period with little immediate gratification. Here, an artificial intelligence (AI) agent can provide personalized interventions to help individuals stick to their goals. In these settings, the AI agent must personalize <i>rapidly</i> (before the individual disengages) and <i>interpretably</i>, to help us understand the behavioral interventions. In this paper, we introduce Behavior Model Reinforcement Learning (BMRL), a framework in which an AI agent intervenes on the parameters of a Markov Decision Process (MDP) belonging to a <i>boundedly rational human agent</i>. Our formulation of the human decision-maker as a planning agent allows us to attribute undesirable human policies (ones that do not lead to the goal) to their maladapted MDP parameters, such as an extremely low discount factor. Furthermore, we propose a class of tractable human models that captures fundamental behaviors in frictionful tasks. Introducing a notion of <i>MDP equivalence</i> specific to BMRL, we theoretically and empirically show that AI planning with our human models can lead to helpful policies on a wide range of more complex, ground-truth humans.</p>","PeriodicalId":93357,"journal":{"name":"Proceedings of the ... International Joint Conference on Autonomous Agents and Multiagent Systems : AAMAS. International Joint Conference on Autonomous Agents and Multiagent Systems","volume":"2024 ","pages":"1482-1491"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11460771/pdf/","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful Tasks.\",\"authors\":\"Eura Nofshin, Siddharth Swaroop, Weiwei Pan, Susan Murphy, Finale Doshi-Velez\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Many important behavior changes are <i>frictionful</i>; they require individuals to expend effort over a long period with little immediate gratification. Here, an artificial intelligence (AI) agent can provide personalized interventions to help individuals stick to their goals. In these settings, the AI agent must personalize <i>rapidly</i> (before the individual disengages) and <i>interpretably</i>, to help us understand the behavioral interventions. In this paper, we introduce Behavior Model Reinforcement Learning (BMRL), a framework in which an AI agent intervenes on the parameters of a Markov Decision Process (MDP) belonging to a <i>boundedly rational human agent</i>. Our formulation of the human decision-maker as a planning agent allows us to attribute undesirable human policies (ones that do not lead to the goal) to their maladapted MDP parameters, such as an extremely low discount factor. Furthermore, we propose a class of tractable human models that captures fundamental behaviors in frictionful tasks. Introducing a notion of <i>MDP equivalence</i> specific to BMRL, we theoretically and empirically show that AI planning with our human models can lead to helpful policies on a wide range of more complex, ground-truth humans.</p>\",\"PeriodicalId\":93357,\"journal\":{\"name\":\"Proceedings of the ... International Joint Conference on Autonomous Agents and Multiagent Systems : AAMAS. International Joint Conference on Autonomous Agents and Multiagent Systems\",\"volume\":\"2024 \",\"pages\":\"1482-1491\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11460771/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... International Joint Conference on Autonomous Agents and Multiagent Systems : AAMAS. International Joint Conference on Autonomous Agents and Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/5/6 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Joint Conference on Autonomous Agents and Multiagent Systems : AAMAS. International Joint Conference on Autonomous Agents and Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/6 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

许多重要的行为改变都是摩擦性的;它们需要个人在很长一段时间内付出努力,却很少有立竿见影的效果。在这种情况下,人工智能(AI)代理可以提供个性化的干预措施,帮助个人坚持自己的目标。在这种情况下,人工智能代理必须快速(在个人脱离之前)、可解释地进行个性化干预,以帮助我们理解行为干预。在本文中,我们介绍了行为模型强化学习(BMRL),在这个框架中,人工智能代理对属于有界理性人类代理的马尔可夫决策过程(MDP)的参数进行干预。我们将人类决策者表述为一个规划代理,这使我们能够将不理想的人类政策(无法实现目标的政策)归因于其不适应的 MDP 参数,例如极低的贴现率。此外,我们还提出了一类易于理解的人类模型,可以捕捉摩擦任务中的基本行为。通过引入 BMRL 特有的 MDP 等效概念,我们从理论和经验上证明,使用我们的人类模型进行人工智能规划,可以为各种更复杂、更真实的人类提供有用的策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful Tasks.

Many important behavior changes are frictionful; they require individuals to expend effort over a long period with little immediate gratification. Here, an artificial intelligence (AI) agent can provide personalized interventions to help individuals stick to their goals. In these settings, the AI agent must personalize rapidly (before the individual disengages) and interpretably, to help us understand the behavioral interventions. In this paper, we introduce Behavior Model Reinforcement Learning (BMRL), a framework in which an AI agent intervenes on the parameters of a Markov Decision Process (MDP) belonging to a boundedly rational human agent. Our formulation of the human decision-maker as a planning agent allows us to attribute undesirable human policies (ones that do not lead to the goal) to their maladapted MDP parameters, such as an extremely low discount factor. Furthermore, we propose a class of tractable human models that captures fundamental behaviors in frictionful tasks. Introducing a notion of MDP equivalence specific to BMRL, we theoretically and empirically show that AI planning with our human models can lead to helpful policies on a wide range of more complex, ground-truth humans.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信