Agents Need Not Know Their Purpose

ArXiv Pub Date : 2024-02-15 DOI:10.48550/arXiv.2402.09734
Paulo Garcia
{"title":"Agents Need Not Know Their Purpose","authors":"Paulo Garcia","doi":"10.48550/arXiv.2402.09734","DOIUrl":null,"url":null,"abstract":"Ensuring artificial intelligence behaves in such a way that is aligned with human values is commonly referred to as the alignment challenge. Prior work has shown that rational agents, behaving in such a way that maximizes a utility function, will inevitably behave in such a way that is not aligned with human values, especially as their level of intelligence goes up. Prior work has also shown that there is no\"one true utility function\"; solutions must include a more holistic approach to alignment. This paper describes oblivious agents: agents that are architected in such a way that their effective utility function is an aggregation of a known and hidden sub-functions. The hidden component, to be maximized, is internally implemented as a black box, preventing the agent from examining it. The known component, to be minimized, is knowledge of the hidden sub-function. Architectural constraints further influence how agent actions can evolve its internal environment model. We show that an oblivious agent, behaving rationally, constructs an internal approximation of designers' intentions (i.e., infers alignment), and, as a consequence of its architecture and effective utility function, behaves in such a way that maximizes alignment; i.e., maximizing the approximated intention function. We show that, paradoxically, it does this for whatever utility function is used as the hidden component and, in contrast with extant techniques, chances of alignment actually improve as agent intelligence grows.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"26 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2402.09734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Ensuring artificial intelligence behaves in such a way that is aligned with human values is commonly referred to as the alignment challenge. Prior work has shown that rational agents, behaving in such a way that maximizes a utility function, will inevitably behave in such a way that is not aligned with human values, especially as their level of intelligence goes up. Prior work has also shown that there is no"one true utility function"; solutions must include a more holistic approach to alignment. This paper describes oblivious agents: agents that are architected in such a way that their effective utility function is an aggregation of a known and hidden sub-functions. The hidden component, to be maximized, is internally implemented as a black box, preventing the agent from examining it. The known component, to be minimized, is knowledge of the hidden sub-function. Architectural constraints further influence how agent actions can evolve its internal environment model. We show that an oblivious agent, behaving rationally, constructs an internal approximation of designers' intentions (i.e., infers alignment), and, as a consequence of its architecture and effective utility function, behaves in such a way that maximizes alignment; i.e., maximizing the approximated intention function. We show that, paradoxically, it does this for whatever utility function is used as the hidden component and, in contrast with extant techniques, chances of alignment actually improve as agent intelligence grows.
代理人不必知道自己的目的
确保人工智能的行为与人类价值观相一致,通常被称为 "一致性挑战"。先前的研究表明,理性代理人的行为方式在使效用函数最大化的同时,不可避免地会与人类价值观不一致,尤其是当他们的智能水平不断提高时。先前的研究还表明,不存在 "一种真正的效用函数";解决方案必须包括一种更全面的协调方法。本文描述的是遗忘型代理:这种代理的架构方式使其有效效用函数成为已知和隐藏子函数的集合。要实现最大化的隐藏部分在内部是一个黑盒子,特工无法对其进行检查。要最小化的已知部分是对隐藏子函数的了解。架构限制进一步影响了代理行动如何发展其内部环境模型。我们证明,一个理性行为的遗忘代理会构建一个设计者意图的内部近似值(即推断对齐),并且,作为其架构和有效效用函数的结果,其行为方式会使对齐最大化;即,使近似意图函数最大化。我们的研究表明,矛盾的是,无论使用什么效用函数作为隐藏组件,它都能做到这一点,而且与现有技术不同的是,随着代理智能的提高,对齐的机会实际上也在提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信