{"title":"Deception and manipulation in generative AI","authors":"Christian Tarsney","doi":"10.1007/s11098-024-02259-8","DOIUrl":null,"url":null,"abstract":"<p>Large language models now possess human-level linguistic abilities in many contexts. This raises the concern that they can be used to deceive and manipulate on unprecedented scales, for instance spreading political misinformation on social media. In future, agentic AI systems might also deceive and manipulate humans for their own purposes. In this paper, first, I argue that AI-generated content should be subject to stricter standards against deception and manipulation than we ordinarily apply to humans. Second, I offer new characterizations of AI deception and manipulation meant to support such standards, according to which a statement is deceptive (resp. manipulative) if it leads human addressees away from the beliefs (resp. choices) they would endorse under “semi-ideal” conditions. Third, I propose two measures to guard against AI deception and manipulation, inspired by this characterization: “extreme transparency” requirements for AI-generated content and “defensive systems” that, among other things, annotate AI-generated statements with contextualizing information. Finally, I consider to what extent these measures can protect against deceptive behavior in future, agentic AI systems.</p>","PeriodicalId":48305,"journal":{"name":"PHILOSOPHICAL STUDIES","volume":"4 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PHILOSOPHICAL STUDIES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11098-024-02259-8","RegionNum":1,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"PHILOSOPHY","Score":null,"Total":0}
引用次数: 0
Abstract
Large language models now possess human-level linguistic abilities in many contexts. This raises the concern that they can be used to deceive and manipulate on unprecedented scales, for instance spreading political misinformation on social media. In future, agentic AI systems might also deceive and manipulate humans for their own purposes. In this paper, first, I argue that AI-generated content should be subject to stricter standards against deception and manipulation than we ordinarily apply to humans. Second, I offer new characterizations of AI deception and manipulation meant to support such standards, according to which a statement is deceptive (resp. manipulative) if it leads human addressees away from the beliefs (resp. choices) they would endorse under “semi-ideal” conditions. Third, I propose two measures to guard against AI deception and manipulation, inspired by this characterization: “extreme transparency” requirements for AI-generated content and “defensive systems” that, among other things, annotate AI-generated statements with contextualizing information. Finally, I consider to what extent these measures can protect against deceptive behavior in future, agentic AI systems.
期刊介绍:
Philosophical Studies was founded in 1950 by Herbert Feigl and Wilfrid Sellars to provide a periodical dedicated to work in analytic philosophy. The journal remains devoted to the publication of papers in exclusively analytic philosophy. Papers applying formal techniques to philosophical problems are welcome. The principal aim is to publish articles that are models of clarity and precision in dealing with significant philosophical issues. It is intended that readers of the journal will be kept abreast of the central issues and problems of contemporary analytic philosophy.
Double-blind review procedure
The journal follows a double-blind reviewing procedure. Authors are therefore requested to place their name and affiliation on a separate page. Self-identifying citations and references in the article text should either be avoided or left blank when manuscripts are first submitted. Authors are responsible for reinserting self-identifying citations and references when manuscripts are prepared for final submission.