Addressing corrigibility in near-future AI systems

AI and ethics Pub Date : 2024-05-16 DOI:10.1007/s43681-024-00484-9

Erez Firt

{"title":"Addressing corrigibility in near-future AI systems","authors":"Erez Firt","doi":"10.1007/s43681-024-00484-9","DOIUrl":null,"url":null,"abstract":"<div><p>When we discuss future advanced autonomous AI systems, one of the worries is that these systems will be capable enough to resist external intervention, even when such intervention is crucial, for example, when the system is not behaving as intended. The rationale behind such worries is that such intelligent systems will be motivated to resist attempts to modify or shut them down so they can preserve their objectives. To mitigate and face these worries, we want our future systems to be corrigible, i.e., to tolerate, cooperate or assist many forms of outside correction. One important reason for considering corrigibility as an important safety property is that we already know how hard it is to construct AI agents with a generalized enough utility function; and the more advanced and capable the agent is, the more it is unlikely that a complex baseline utility function built into it will be perfect from the start. In this paper, we try to achieve corrigibility in (at least) systems based on known or near-future (imaginable) technology, by endorsing and integrating different approaches to building AI-based systems. Our proposal replaces the attempts to provide a corrigible utility function with the proposed corrigible software architecture; this takes the agency off the RL agent – which now becomes an RL solver – and grants it to the system as a whole.</p></div>","PeriodicalId":72137,"journal":{"name":"AI and ethics","volume":"5 2","pages":"1481 - 1490"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43681-024-00484-9.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI and ethics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43681-024-00484-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

When we discuss future advanced autonomous AI systems, one of the worries is that these systems will be capable enough to resist external intervention, even when such intervention is crucial, for example, when the system is not behaving as intended. The rationale behind such worries is that such intelligent systems will be motivated to resist attempts to modify or shut them down so they can preserve their objectives. To mitigate and face these worries, we want our future systems to be corrigible, i.e., to tolerate, cooperate or assist many forms of outside correction. One important reason for considering corrigibility as an important safety property is that we already know how hard it is to construct AI agents with a generalized enough utility function; and the more advanced and capable the agent is, the more it is unlikely that a complex baseline utility function built into it will be perfect from the start. In this paper, we try to achieve corrigibility in (at least) systems based on known or near-future (imaginable) technology, by endorsing and integrating different approaches to building AI-based systems. Our proposal replaces the attempts to provide a corrigible utility function with the proposed corrigible software architecture; this takes the agency off the RL agent – which now becomes an RL solver – and grants it to the system as a whole.

查看原文本刊更多论文

解决近未来人工智能系统中的可重复性问题

当我们讨论未来先进的自主人工智能系统时，其中一个担忧是，这些系统将有足够的能力抵抗外部干预，即使这种干预是至关重要的，例如，当系统没有按预期运行时。这种担忧背后的理由是，这样的智能系统将会受到激励，抵制修改或关闭它们的企图，以保持它们的目标。为了减轻和面对这些担忧，我们希望我们未来的系统是可纠正的，也就是说，能够容忍、合作或协助多种形式的外部纠正。将可纠正性视为一种重要的安全属性的一个重要原因是，我们已经知道构建具有足够广义效用函数的AI代理是多么困难；智能体越先进、能力越强，其内置的复杂基线效用函数从一开始就越不可能完美。在本文中，我们试图通过认可和集成不同的方法来构建基于ai的系统，从而在（至少）基于已知或近期（可想象的）技术的系统中实现可纠正性。我们的建议用提出的可纠错的软件架构取代了提供可纠错的效用函数的尝试；这使该机构脱离了RL代理——现在变成了RL解决者——并将其作为一个整体授予系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AI and ethics

自引率

0.00%

发文量