RogueGPT: transforming ChatGPT-4 into a rogue AI with dis-ethical tuning

Alessio Buscemi, Daniele Proverbio
{"title":"RogueGPT: transforming ChatGPT-4 into a rogue AI with dis-ethical tuning","authors":"Alessio Buscemi,&nbsp;Daniele Proverbio","doi":"10.1007/s43681-025-00750-4","DOIUrl":null,"url":null,"abstract":"<div><p>The ethical implications and potentials for misuse of Generative Artificial Intelligence are increasingly worrying topics. This paper explores how easily the default ethical guardrails of ChatGPT, using its latest customization features, can be bypassed by simple prompts and fine-tuning, that can be effortlessly accessed by the broad public. This malevolently altered version of ChatGPT, nicknamed “RogueGPT”, responded with worrying behaviours, beyond those triggered by jailbreak prompts. We conduct an empirical study of RogueGPT responses, assessing its flexibility in answering questions pertaining to what should be disallowed usage. Our findings raise significant concerns about the model’s knowledge about topics like illegal drug production, torture methods and terrorism. The ease of driving ChatGPT astray, coupled with its global accessibility, highlights severe issues regarding the data quality used for training the foundational model and the implementation of ethical safeguards. We thus underline the responsibilities and dangers of user-driven modifications, and the broader effects that these may have on the design of safeguarding and ethical modules implemented by AI programmers. Disclaimer. This paper contains examples of harmful language. Reader discretion is recommended.</p></div>","PeriodicalId":72137,"journal":{"name":"AI and ethics","volume":"5 5","pages":"4945 - 4966"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43681-025-00750-4.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI and ethics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43681-025-00750-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The ethical implications and potentials for misuse of Generative Artificial Intelligence are increasingly worrying topics. This paper explores how easily the default ethical guardrails of ChatGPT, using its latest customization features, can be bypassed by simple prompts and fine-tuning, that can be effortlessly accessed by the broad public. This malevolently altered version of ChatGPT, nicknamed “RogueGPT”, responded with worrying behaviours, beyond those triggered by jailbreak prompts. We conduct an empirical study of RogueGPT responses, assessing its flexibility in answering questions pertaining to what should be disallowed usage. Our findings raise significant concerns about the model’s knowledge about topics like illegal drug production, torture methods and terrorism. The ease of driving ChatGPT astray, coupled with its global accessibility, highlights severe issues regarding the data quality used for training the foundational model and the implementation of ethical safeguards. We thus underline the responsibilities and dangers of user-driven modifications, and the broader effects that these may have on the design of safeguarding and ethical modules implemented by AI programmers. Disclaimer. This paper contains examples of harmful language. Reader discretion is recommended.

RogueGPT:将ChatGPT-4转变为具有不道德调整的流氓AI
生成式人工智能的伦理影响和滥用的可能性日益成为令人担忧的话题。本文探讨了ChatGPT使用其最新的定制功能,可以通过简单的提示和微调来绕过默认的道德护栏,从而可以毫不费力地为广大公众所访问。这个被恶意修改的ChatGPT版本,绰号“RogueGPT”,反应出令人担忧的行为,而不仅仅是由越狱提示触发的行为。我们对RogueGPT的响应进行了实证研究,评估了其在回答有关不允许使用的问题时的灵活性。我们的发现引起了人们对该模型关于非法毒品生产、酷刑方法和恐怖主义等主题的知识的重大关注。ChatGPT容易误入歧途,再加上其全球可访问性,突显了用于培训基础模型和实施道德保障的数据质量方面的严重问题。因此,我们强调了用户驱动的修改的责任和危险,以及这些可能对人工智能程序员实施的保护和道德模块的设计产生的更广泛影响。免责声明。这篇论文中有一些有害语言的例子。建议读者谨慎阅读。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信