{"title":"Disagreement, AI alignment, and bargaining","authors":"Harry R. Lloyd","doi":"10.1007/s11098-024-02224-5","DOIUrl":null,"url":null,"abstract":"<p>New AI technologies have the potential to cause unintended harms in diverse domains including warfare, judicial sentencing, medicine and governance. One strategy for realising the benefits of AI whilst avoiding its potential dangers is to ensure that new AIs are properly ‘aligned’ with some form of ‘alignment target.’ One danger of this strategy is that–dependent on the alignment target chosen–our AIs might optimise for objectives that reflect the values only of a certain subset of society, and that do not take into account alternative views about what constitutes desirable and safe behaviour for AI agents. In response to this problem, several AI ethicists have suggested alignment targets that are designed to be sensitive to widespread normative disagreement amongst the relevant stakeholders. Authors inspired by voting theory have suggested that AIs should be aligned with the verdicts of actual or simulated ‘<i>moral parliaments</i>’ whose members represent the normative views of the relevant stakeholders. Other authors inspired by decision theory and the philosophical literature on moral uncertainty have suggested that AIs should maximise socially expected choiceworthiness. In this paper, I argue that both of these proposals face several important problems. In particular, they fail to select attractive ‘compromise options’ in cases where such options are available. I go on to propose and defend an alternative, bargaining-theoretic alignment target, which avoids the problems associated with the voting- and decision-theoretic approaches.</p>","PeriodicalId":48305,"journal":{"name":"PHILOSOPHICAL STUDIES","volume":"76 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PHILOSOPHICAL STUDIES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11098-024-02224-5","RegionNum":1,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"PHILOSOPHY","Score":null,"Total":0}
引用次数: 0
Abstract
New AI technologies have the potential to cause unintended harms in diverse domains including warfare, judicial sentencing, medicine and governance. One strategy for realising the benefits of AI whilst avoiding its potential dangers is to ensure that new AIs are properly ‘aligned’ with some form of ‘alignment target.’ One danger of this strategy is that–dependent on the alignment target chosen–our AIs might optimise for objectives that reflect the values only of a certain subset of society, and that do not take into account alternative views about what constitutes desirable and safe behaviour for AI agents. In response to this problem, several AI ethicists have suggested alignment targets that are designed to be sensitive to widespread normative disagreement amongst the relevant stakeholders. Authors inspired by voting theory have suggested that AIs should be aligned with the verdicts of actual or simulated ‘moral parliaments’ whose members represent the normative views of the relevant stakeholders. Other authors inspired by decision theory and the philosophical literature on moral uncertainty have suggested that AIs should maximise socially expected choiceworthiness. In this paper, I argue that both of these proposals face several important problems. In particular, they fail to select attractive ‘compromise options’ in cases where such options are available. I go on to propose and defend an alternative, bargaining-theoretic alignment target, which avoids the problems associated with the voting- and decision-theoretic approaches.
期刊介绍:
Philosophical Studies was founded in 1950 by Herbert Feigl and Wilfrid Sellars to provide a periodical dedicated to work in analytic philosophy. The journal remains devoted to the publication of papers in exclusively analytic philosophy. Papers applying formal techniques to philosophical problems are welcome. The principal aim is to publish articles that are models of clarity and precision in dealing with significant philosophical issues. It is intended that readers of the journal will be kept abreast of the central issues and problems of contemporary analytic philosophy.
Double-blind review procedure
The journal follows a double-blind reviewing procedure. Authors are therefore requested to place their name and affiliation on a separate page. Self-identifying citations and references in the article text should either be avoided or left blank when manuscripts are first submitted. Authors are responsible for reinserting self-identifying citations and references when manuscripts are prepared for final submission.