AI safety: a climb to Armageddon?

IF 1.3 1区哲学 0 PHILOSOPHY

PHILOSOPHICAL STUDIES Pub Date : 2025-03-06 DOI:10.1007/s11098-025-02297-w

Herman Cappelen, Josh Dever, John Hawthorne

引用次数: 0

Abstract

This paper presents an argument that certain AI safety measures, rather than mitigating existential risk, may instead exacerbate it. Under certain key assumptions - the inevitability of AI failure, the expected correlation between an AI system's power at the point of failure and the severity of the resulting harm, and the tendency of safety measures to enable AI systems to become more powerful before failing - safety efforts have negative expected utility. The paper examines three response strategies: Optimism, Mitigation, and Holism. Each faces challenges stemming from intrinsic features of the AI safety landscape that we term Bottlenecking, the Perfection Barrier, and Equilibrium Fluctuation. The surprising robustness of the argument forces a reexamination of core assumptions around AI safety and points to several avenues for further research.

查看原文本刊更多论文

人工智能安全：走向世界末日？

本文提出了一个论点，即某些人工智能安全措施，而不是减轻存在风险，反而可能加剧这种风险。在某些关键假设下——人工智能故障的必然性，人工智能系统在故障点的功率与由此造成的伤害的严重程度之间的预期相关性，以及使人工智能系统在故障前变得更强大的安全措施的趋势——安全努力具有负的预期效用。本文考察了三种应对策略：乐观、缓解和整体主义。每个人都面临着来自人工智能安全景观的内在特征的挑战，我们称之为瓶颈、完美障碍和平衡波动。这一论点令人惊讶的坚固性迫使人们重新审视围绕人工智能安全性的核心假设，并指出了进一步研究的几个途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PHILOSOPHICAL STUDIES PHILOSOPHY-

CiteScore

2.60

自引率

7.70%

发文量

127

期刊介绍： Philosophical Studies was founded in 1950 by Herbert Feigl and Wilfrid Sellars to provide a periodical dedicated to work in analytic philosophy. The journal remains devoted to the publication of papers in exclusively analytic philosophy. Papers applying formal techniques to philosophical problems are welcome. The principal aim is to publish articles that are models of clarity and precision in dealing with significant philosophical issues. It is intended that readers of the journal will be kept abreast of the central issues and problems of contemporary analytic philosophy. Double-blind review procedure The journal follows a double-blind reviewing procedure. Authors are therefore requested to place their name and affiliation on a separate page. Self-identifying citations and references in the article text should either be avoided or left blank when manuscripts are first submitted. Authors are responsible for reinserting self-identifying citations and references when manuscripts are prepared for final submission.