{"title":"AI safety: a climb to Armageddon?","authors":"Herman Cappelen, Josh Dever, John Hawthorne","doi":"10.1007/s11098-025-02297-w","DOIUrl":null,"url":null,"abstract":"<p>This paper presents an argument that certain AI safety measures, rather than\nmitigating existential risk, may instead exacerbate it. Under certain key assumptions -\nthe inevitability of AI failure, the expected correlation between an AI system's power at\nthe point of failure and the severity of the resulting harm, and the tendency of safety\nmeasures to enable AI systems to become more powerful before failing - safety efforts\nhave negative expected utility. The paper examines three response strategies:\nOptimism, Mitigation, and Holism. Each faces challenges stemming from intrinsic\nfeatures of the AI safety landscape that we term Bottlenecking, the Perfection Barrier,\nand Equilibrium Fluctuation. The surprising robustness of the argument forces a reexamination\nof core assumptions around AI safety and points to several avenues for\nfurther research.</p>","PeriodicalId":48305,"journal":{"name":"PHILOSOPHICAL STUDIES","volume":"37 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PHILOSOPHICAL STUDIES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11098-025-02297-w","RegionNum":1,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"PHILOSOPHY","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents an argument that certain AI safety measures, rather than
mitigating existential risk, may instead exacerbate it. Under certain key assumptions -
the inevitability of AI failure, the expected correlation between an AI system's power at
the point of failure and the severity of the resulting harm, and the tendency of safety
measures to enable AI systems to become more powerful before failing - safety efforts
have negative expected utility. The paper examines three response strategies:
Optimism, Mitigation, and Holism. Each faces challenges stemming from intrinsic
features of the AI safety landscape that we term Bottlenecking, the Perfection Barrier,
and Equilibrium Fluctuation. The surprising robustness of the argument forces a reexamination
of core assumptions around AI safety and points to several avenues for
further research.
期刊介绍:
Philosophical Studies was founded in 1950 by Herbert Feigl and Wilfrid Sellars to provide a periodical dedicated to work in analytic philosophy. The journal remains devoted to the publication of papers in exclusively analytic philosophy. Papers applying formal techniques to philosophical problems are welcome. The principal aim is to publish articles that are models of clarity and precision in dealing with significant philosophical issues. It is intended that readers of the journal will be kept abreast of the central issues and problems of contemporary analytic philosophy.
Double-blind review procedure
The journal follows a double-blind reviewing procedure. Authors are therefore requested to place their name and affiliation on a separate page. Self-identifying citations and references in the article text should either be avoided or left blank when manuscripts are first submitted. Authors are responsible for reinserting self-identifying citations and references when manuscripts are prepared for final submission.