Explanation Through Reward Model Reconciliation using POMDP Tree Search

2023 IEEE International Conference on Assured Autonomy (ICAA) Pub Date : 2023-05-01 DOI:10.1109/ICAA58325.2023.00027

Benjamin D. Kraske, Anshu Saksena, A. Buczak, Zachary Sunberg

引用次数: 0

Abstract

As artificial intelligence (AI) algorithms are increasingly used in mission-critical applications, promoting user-trust of these systems will be essential to their success. Ensuring users understand the models over which algorithms reason promotes user trust. This work seeks to reconcile differences between the reward model that an algorithm uses for online partially observable Markov decision (POMDP) planning and the implicit reward model assumed by a human user. Action discrepancies, differences in decisions made by an algorithm and user, are leveraged to estimate a user’s objectives as expressed in weightings of a reward function.

查看原文本刊更多论文

基于POMDP树搜索的奖励模型调和解释

随着人工智能(AI)算法越来越多地用于关键任务应用，促进用户对这些系统的信任将是其成功的关键。确保用户理解算法推理促进用户信任的模型。这项工作旨在调和算法用于在线部分可观察马尔可夫决策(POMDP)规划的奖励模型与人类用户假设的隐式奖励模型之间的差异。行动差异，即算法和用户所做决定的差异，被用来评估用户的目标，即奖励函数的权重。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE International Conference on Assured Autonomy (ICAA)

自引率

0.00%

发文量