RogueGPT: transforming ChatGPT-4 into a rogue AI with dis-ethical tuning

AI and ethics Pub Date : 2025-05-21 DOI:10.1007/s43681-025-00750-4

Alessio Buscemi, Daniele Proverbio

{"title":"RogueGPT: transforming ChatGPT-4 into a rogue AI with dis-ethical tuning","authors":"Alessio Buscemi, Daniele Proverbio","doi":"10.1007/s43681-025-00750-4","DOIUrl":null,"url":null,"abstract":"<div><p>The ethical implications and potentials for misuse of Generative Artificial Intelligence are increasingly worrying topics. This paper explores how easily the default ethical guardrails of ChatGPT, using its latest customization features, can be bypassed by simple prompts and fine-tuning, that can be effortlessly accessed by the broad public. This malevolently altered version of ChatGPT, nicknamed “RogueGPT”, responded with worrying behaviours, beyond those triggered by jailbreak prompts. We conduct an empirical study of RogueGPT responses, assessing its flexibility in answering questions pertaining to what should be disallowed usage. Our findings raise significant concerns about the model’s knowledge about topics like illegal drug production, torture methods and terrorism. The ease of driving ChatGPT astray, coupled with its global accessibility, highlights severe issues regarding the data quality used for training the foundational model and the implementation of ethical safeguards. We thus underline the responsibilities and dangers of user-driven modifications, and the broader effects that these may have on the design of safeguarding and ethical modules implemented by AI programmers. Disclaimer. This paper contains examples of harmful language. Reader discretion is recommended.</p></div>","PeriodicalId":72137,"journal":{"name":"AI and ethics","volume":"5 5","pages":"4945 - 4966"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43681-025-00750-4.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI and ethics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43681-025-00750-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The ethical implications and potentials for misuse of Generative Artificial Intelligence are increasingly worrying topics. This paper explores how easily the default ethical guardrails of ChatGPT, using its latest customization features, can be bypassed by simple prompts and fine-tuning, that can be effortlessly accessed by the broad public. This malevolently altered version of ChatGPT, nicknamed “RogueGPT”, responded with worrying behaviours, beyond those triggered by jailbreak prompts. We conduct an empirical study of RogueGPT responses, assessing its flexibility in answering questions pertaining to what should be disallowed usage. Our findings raise significant concerns about the model’s knowledge about topics like illegal drug production, torture methods and terrorism. The ease of driving ChatGPT astray, coupled with its global accessibility, highlights severe issues regarding the data quality used for training the foundational model and the implementation of ethical safeguards. We thus underline the responsibilities and dangers of user-driven modifications, and the broader effects that these may have on the design of safeguarding and ethical modules implemented by AI programmers. Disclaimer. This paper contains examples of harmful language. Reader discretion is recommended.

查看原文本刊更多论文

RogueGPT：将ChatGPT-4转变为具有不道德调整的流氓AI

生成式人工智能的伦理影响和滥用的可能性日益成为令人担忧的话题。本文探讨了ChatGPT使用其最新的定制功能，可以通过简单的提示和微调来绕过默认的道德护栏，从而可以毫不费力地为广大公众所访问。这个被恶意修改的ChatGPT版本，绰号“RogueGPT”，反应出令人担忧的行为，而不仅仅是由越狱提示触发的行为。我们对RogueGPT的响应进行了实证研究，评估了其在回答有关不允许使用的问题时的灵活性。我们的发现引起了人们对该模型关于非法毒品生产、酷刑方法和恐怖主义等主题的知识的重大关注。ChatGPT容易误入歧途，再加上其全球可访问性，突显了用于培训基础模型和实施道德保障的数据质量方面的严重问题。因此，我们强调了用户驱动的修改的责任和危险，以及这些可能对人工智能程序员实施的保护和道德模块的设计产生的更广泛影响。免责声明。这篇论文中有一些有害语言的例子。建议读者谨慎阅读。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AI and ethics

自引率

0.00%

发文量