{"title":"Mapping Technical Safety Research at AI Companies: A literature review and incentives analysis","authors":"Oscar Delaney, Oliver Guest, Zoe Williams","doi":"arxiv-2409.07878","DOIUrl":null,"url":null,"abstract":"As artificial intelligence (AI) systems become more advanced, concerns about\nlarge-scale risks from misuse or accidents have grown. This report analyzes the\ntechnical research into safe AI development being conducted by three leading AI\ncompanies: Anthropic, Google DeepMind, and OpenAI. We define safe AI development as developing AI systems that are unlikely to\npose large-scale misuse or accident risks. This encompasses a range of\ntechnical approaches aimed at ensuring AI systems behave as intended and do not\ncause unintended harm, even as they are made more capable and autonomous. We analyzed all papers published by the three companies from January 2022 to\nJuly 2024 that were relevant to safe AI development, and categorized the 61\nincluded papers into eight safety approaches. Additionally, we noted three\ncategories representing nascent approaches explored by academia and civil\nsociety, but not currently represented in any papers by the three companies.\nOur analysis reveals where corporate attention is concentrated and where\npotential gaps lie. Some AI research may stay unpublished for good reasons, such as to not inform\nadversaries about security techniques they would need to overcome to misuse AI\nsystems. Therefore, we also considered the incentives that AI companies have to\nresearch each approach. In particular, we considered reputational effects,\nregulatory burdens, and whether the approaches could make AI systems more\nuseful. We identified three categories where there are currently no or few papers and\nwhere we do not expect AI companies to become more incentivized to pursue this\nresearch in the future. These are multi-agent safety, model organisms of\nmisalignment, and safety by design. Our findings provide an indication that\nthese approaches may be slow to progress without funding or efforts from\ngovernment, civil society, philanthropists, or academia.","PeriodicalId":501112,"journal":{"name":"arXiv - CS - Computers and Society","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computers and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As artificial intelligence (AI) systems become more advanced, concerns about
large-scale risks from misuse or accidents have grown. This report analyzes the
technical research into safe AI development being conducted by three leading AI
companies: Anthropic, Google DeepMind, and OpenAI. We define safe AI development as developing AI systems that are unlikely to
pose large-scale misuse or accident risks. This encompasses a range of
technical approaches aimed at ensuring AI systems behave as intended and do not
cause unintended harm, even as they are made more capable and autonomous. We analyzed all papers published by the three companies from January 2022 to
July 2024 that were relevant to safe AI development, and categorized the 61
included papers into eight safety approaches. Additionally, we noted three
categories representing nascent approaches explored by academia and civil
society, but not currently represented in any papers by the three companies.
Our analysis reveals where corporate attention is concentrated and where
potential gaps lie. Some AI research may stay unpublished for good reasons, such as to not inform
adversaries about security techniques they would need to overcome to misuse AI
systems. Therefore, we also considered the incentives that AI companies have to
research each approach. In particular, we considered reputational effects,
regulatory burdens, and whether the approaches could make AI systems more
useful. We identified three categories where there are currently no or few papers and
where we do not expect AI companies to become more incentivized to pursue this
research in the future. These are multi-agent safety, model organisms of
misalignment, and safety by design. Our findings provide an indication that
these approaches may be slow to progress without funding or efforts from
government, civil society, philanthropists, or academia.