AI Alignment and Human Reward

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society Pub Date : 2021-07-21 DOI:10.1145/3461702.3462570

Patrick Butlin

引用次数: 6

Abstract

According to a prominent approach to AI alignment, AI agents should be built to learn and promote human values. However, humans value things in several different ways: we have desires and preferences of various kinds, and if we engage in reinforcement learning, we also have reward functions. One research project to which this approach gives rise is therefore to say which of these various classes of human values should be promoted. This paper takes on part of this project by assessing the proposal that human reward functions should be the target for AI alignment. There is some reason to believe that powerful AI agents which were aligned to values of this form would help us to lead good lives, but there is also considerable uncertainty about this claim, arising from unresolved empirical and conceptual issues in human psychology.

查看原文本刊更多论文

AI对齐和人类奖励

根据一种突出的人工智能对齐方法，应该建立人工智能代理来学习和促进人类的价值观。然而，人类以几种不同的方式评价事物:我们有各种各样的欲望和偏好，如果我们进行强化学习，我们也有奖励功能。因此，这种方法引发的一个研究项目是，在这些不同类别的人类价值观中，哪一类应该得到促进。本文通过评估人类奖励功能应该成为人工智能校准目标的提议来承担该项目的一部分。我们有理由相信，与这种形式的价值观相一致的强大的人工智能代理将帮助我们过上美好的生活，但由于人类心理学中尚未解决的经验和概念问题，这种说法也存在相当大的不确定性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

自引率

0.00%

发文量