Mapping Technical Safety Research at AI Companies: A literature review and incentives analysis

Oscar Delaney, Oliver Guest, Zoe Williams
{"title":"Mapping Technical Safety Research at AI Companies: A literature review and incentives analysis","authors":"Oscar Delaney, Oliver Guest, Zoe Williams","doi":"arxiv-2409.07878","DOIUrl":null,"url":null,"abstract":"As artificial intelligence (AI) systems become more advanced, concerns about\nlarge-scale risks from misuse or accidents have grown. This report analyzes the\ntechnical research into safe AI development being conducted by three leading AI\ncompanies: Anthropic, Google DeepMind, and OpenAI. We define safe AI development as developing AI systems that are unlikely to\npose large-scale misuse or accident risks. This encompasses a range of\ntechnical approaches aimed at ensuring AI systems behave as intended and do not\ncause unintended harm, even as they are made more capable and autonomous. We analyzed all papers published by the three companies from January 2022 to\nJuly 2024 that were relevant to safe AI development, and categorized the 61\nincluded papers into eight safety approaches. Additionally, we noted three\ncategories representing nascent approaches explored by academia and civil\nsociety, but not currently represented in any papers by the three companies.\nOur analysis reveals where corporate attention is concentrated and where\npotential gaps lie. Some AI research may stay unpublished for good reasons, such as to not inform\nadversaries about security techniques they would need to overcome to misuse AI\nsystems. Therefore, we also considered the incentives that AI companies have to\nresearch each approach. In particular, we considered reputational effects,\nregulatory burdens, and whether the approaches could make AI systems more\nuseful. We identified three categories where there are currently no or few papers and\nwhere we do not expect AI companies to become more incentivized to pursue this\nresearch in the future. These are multi-agent safety, model organisms of\nmisalignment, and safety by design. Our findings provide an indication that\nthese approaches may be slow to progress without funding or efforts from\ngovernment, civil society, philanthropists, or academia.","PeriodicalId":501112,"journal":{"name":"arXiv - CS - Computers and Society","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computers and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As artificial intelligence (AI) systems become more advanced, concerns about large-scale risks from misuse or accidents have grown. This report analyzes the technical research into safe AI development being conducted by three leading AI companies: Anthropic, Google DeepMind, and OpenAI. We define safe AI development as developing AI systems that are unlikely to pose large-scale misuse or accident risks. This encompasses a range of technical approaches aimed at ensuring AI systems behave as intended and do not cause unintended harm, even as they are made more capable and autonomous. We analyzed all papers published by the three companies from January 2022 to July 2024 that were relevant to safe AI development, and categorized the 61 included papers into eight safety approaches. Additionally, we noted three categories representing nascent approaches explored by academia and civil society, but not currently represented in any papers by the three companies. Our analysis reveals where corporate attention is concentrated and where potential gaps lie. Some AI research may stay unpublished for good reasons, such as to not inform adversaries about security techniques they would need to overcome to misuse AI systems. Therefore, we also considered the incentives that AI companies have to research each approach. In particular, we considered reputational effects, regulatory burdens, and whether the approaches could make AI systems more useful. We identified three categories where there are currently no or few papers and where we do not expect AI companies to become more incentivized to pursue this research in the future. These are multi-agent safety, model organisms of misalignment, and safety by design. Our findings provide an indication that these approaches may be slow to progress without funding or efforts from government, civil society, philanthropists, or academia.
人工智能公司的技术安全研究图谱:文献回顾与激励分析
随着人工智能 (AI) 系统变得越来越先进,人们对误用或事故造成的大规模风险的担忧也与日俱增。本报告分析了三家领先的人工智能公司正在进行的人工智能安全开发技术研究:Anthropic、Google DeepMind 和 OpenAI。我们将安全人工智能开发定义为开发不太可能存在大规模误用或事故风险的人工智能系统。这包括一系列技术方法,旨在确保人工智能系统的行为符合预期,不会造成意外伤害,即使这些系统的能力和自主性不断提高。我们分析了这三家公司在 2022 年 1 月至 2024 年 7 月期间发表的所有与人工智能安全开发相关的论文,并将其中的 61 篇论文分为八种安全方法。此外,我们还注意到三类代表学术界和民间社会探索的新生方法,但目前这三家公司的任何论文中都没有体现。有些人工智能研究可能出于良好的原因而未发表,例如不向对手提供滥用人工智能系统所需的安全技术。因此,我们也考虑了人工智能公司研究每种方法的动机。特别是,我们考虑了声誉影响、监管负担以及这些方法是否能让人工智能系统更有用。我们发现有三类方法目前还没有论文或论文数量很少,而且我们预计人工智能公司未来也不会有更大的动力去研究这些方法。这三类论文分别是:多代理安全、调配模型生物和设计安全。我们的研究结果表明,如果没有政府、民间社会、慈善家或学术界的资助或努力,这些方法可能进展缓慢。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信