当前人工智能失调的案例及其对未来风险的影响

IF 1.3 1区哲学 Q1 HISTORY & PHILOSOPHY OF SCIENCE

Synthese Pub Date : 2023-10-26 DOI:10.1007/s11229-023-04367-0

Leonard Dung

{"title":"当前人工智能失调的案例及其对未来风险的影响","authors":"Leonard Dung","doi":"10.1007/s11229-023-04367-0","DOIUrl":null,"url":null,"abstract":"Abstract How can one build AI systems such that they pursue the goals their designers want them to pursue? This is the alignment problem . Numerous authors have raised concerns that, as research advances and systems become more powerful over time, misalignment might lead to catastrophic outcomes, perhaps even to the extinction or permanent disempowerment of humanity. In this paper, I analyze the severity of this risk based on current instances of misalignment. More specifically, I argue that contemporary large language models and game-playing agents are sometimes misaligned. These cases suggest that misalignment tends to have a variety of features: misalignment can be hard to detect, predict and remedy, it does not depend on a specific architecture or training paradigm, it tends to diminish a system’s usefulness and it is the default outcome of creating AI via machine learning. Subsequently, based on these features, I show that the risk of AI alignment magnifies with respect to more capable systems. Not only might more capable systems cause more harm when misaligned, aligning them should be expected to be more difficult than aligning current AI.","PeriodicalId":49452,"journal":{"name":"Synthese","volume":"17 1","pages":"0"},"PeriodicalIF":1.3000,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Current cases of AI misalignment and their implications for future risks\",\"authors\":\"Leonard Dung\",\"doi\":\"10.1007/s11229-023-04367-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract How can one build AI systems such that they pursue the goals their designers want them to pursue? This is the alignment problem . Numerous authors have raised concerns that, as research advances and systems become more powerful over time, misalignment might lead to catastrophic outcomes, perhaps even to the extinction or permanent disempowerment of humanity. In this paper, I analyze the severity of this risk based on current instances of misalignment. More specifically, I argue that contemporary large language models and game-playing agents are sometimes misaligned. These cases suggest that misalignment tends to have a variety of features: misalignment can be hard to detect, predict and remedy, it does not depend on a specific architecture or training paradigm, it tends to diminish a system’s usefulness and it is the default outcome of creating AI via machine learning. Subsequently, based on these features, I show that the risk of AI alignment magnifies with respect to more capable systems. Not only might more capable systems cause more harm when misaligned, aligning them should be expected to be more difficult than aligning current AI.\",\"PeriodicalId\":49452,\"journal\":{\"name\":\"Synthese\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2023-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Synthese\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11229-023-04367-0\",\"RegionNum\":1,\"RegionCategory\":\"哲学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HISTORY & PHILOSOPHY OF SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Synthese","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11229-023-04367-0","RegionNum":1,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HISTORY & PHILOSOPHY OF SCIENCE","Score":null,"Total":0}

引用次数: 0

摘要

如何构建AI系统，使其能够追求设计师希望它们追求的目标?这就是对齐问题。许多作者都担心，随着时间的推移，随着研究的进步和系统变得越来越强大，不协调可能会导致灾难性的后果，甚至可能导致人类的灭绝或永久丧失权力。在本文中，我基于当前的不一致实例分析了这种风险的严重性。更具体地说，我认为当代大型语言模型和游戏代理有时是不一致的。这些案例表明，偏差往往具有多种特征:偏差可能难以检测、预测和补救，它不依赖于特定的架构或训练范例，它往往会降低系统的有用性，并且它是通过机器学习创建人工智能的默认结果。随后，基于这些特征，我展示了相对于更有能力的系统，AI对齐的风险会放大。更强大的系统在不协调时不仅会造成更大的伤害，而且调整它们应该比调整当前的AI更困难。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Current cases of AI misalignment and their implications for future risks

Abstract How can one build AI systems such that they pursue the goals their designers want them to pursue? This is the alignment problem . Numerous authors have raised concerns that, as research advances and systems become more powerful over time, misalignment might lead to catastrophic outcomes, perhaps even to the extinction or permanent disempowerment of humanity. In this paper, I analyze the severity of this risk based on current instances of misalignment. More specifically, I argue that contemporary large language models and game-playing agents are sometimes misaligned. These cases suggest that misalignment tends to have a variety of features: misalignment can be hard to detect, predict and remedy, it does not depend on a specific architecture or training paradigm, it tends to diminish a system’s usefulness and it is the default outcome of creating AI via machine learning. Subsequently, based on these features, I show that the risk of AI alignment magnifies with respect to more capable systems. Not only might more capable systems cause more harm when misaligned, aligning them should be expected to be more difficult than aligning current AI.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Synthese 管理科学-科学史与科学哲学

CiteScore

3.30

自引率

13.30%

发文量

471

审稿时长

1 months

期刊介绍： Synthese is a philosophy journal focusing on contemporary issues in epistemology, philosophy of science, and related fields. More specifically, we divide our areas of interest into four groups: (1) epistemology, methodology, and philosophy of science, all broadly understood. (2) The foundations of logic and mathematics, where ‘logic’, ‘mathematics’, and ‘foundations’ are all broadly understood. (3) Formal methods in philosophy, including methods connecting philosophy to other academic fields. (4) Issues in ethics and the history and sociology of logic, mathematics, and science that contribute to the contemporary studies Synthese focuses on, as described in (1)-(3) above.