The Unprecedented Surge in Generative AI: Empirical Analysis of Trusted and Malicious Large Language Models (LLMs)

IF 1.9 4区 工程技术 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC
Aditya K. Sood;Sherali Zeadally
{"title":"The Unprecedented Surge in Generative AI: Empirical Analysis of Trusted and Malicious Large Language Models (LLMs)","authors":"Aditya K. Sood;Sherali Zeadally","doi":"10.1109/MTS.2025.3582667","DOIUrl":null,"url":null,"abstract":"Trusted large language models (LLMs) inherit ethical guidelines to prevent generating harmful content, whereas malicious LLMs are engineered to enable the generation of unethical and toxic responses. Both trusted and malicious LLMs use guardrails in differential contexts per the requirements of the developers and attackers, respectively. We explore the multifaceted world of guardrails implementation in LLMs by conducting an empirical analysis to assess the effectiveness of guardrails using prompts. Our results revealed that guardrails deployed in the trusted LLMs could be bypassed using prompt manipulation techniques such as “pretend” and “persist” to generate harmful content. In addition, we also discovered that malicious LLMs still deploy weak guardrails to evade detection by generating human-like content. This empirical analysis provides insights into the design of the malicious and trusted LLMs. We also propose recommendations to defend against prompt manipulation and guardrails bypass while designing LLMs.","PeriodicalId":55016,"journal":{"name":"IEEE Technology and Society Magazine","volume":"44 3","pages":"98-108"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Technology and Society Magazine","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11091436/","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Trusted large language models (LLMs) inherit ethical guidelines to prevent generating harmful content, whereas malicious LLMs are engineered to enable the generation of unethical and toxic responses. Both trusted and malicious LLMs use guardrails in differential contexts per the requirements of the developers and attackers, respectively. We explore the multifaceted world of guardrails implementation in LLMs by conducting an empirical analysis to assess the effectiveness of guardrails using prompts. Our results revealed that guardrails deployed in the trusted LLMs could be bypassed using prompt manipulation techniques such as “pretend” and “persist” to generate harmful content. In addition, we also discovered that malicious LLMs still deploy weak guardrails to evade detection by generating human-like content. This empirical analysis provides insights into the design of the malicious and trusted LLMs. We also propose recommendations to defend against prompt manipulation and guardrails bypass while designing LLMs.
生成式人工智能的空前发展:可信和恶意大型语言模型(LLMs)的实证分析
可信的大型语言模型(llm)继承了道德准则,以防止产生有害的内容,而恶意的llm被设计成能够产生不道德和有毒的响应。可信的和恶意的llm分别根据开发人员和攻击者的需求在不同的上下文中使用护栏。我们通过进行实证分析来评估使用提示的护栏的有效性,探索了法学硕士护栏实施的多方面世界。我们的研究结果表明,部署在可信llm中的护栏可以通过使用“假装”和“坚持”等即时操纵技术来绕过,从而产生有害内容。此外,我们还发现恶意llm仍然部署薄弱的护栏,通过生成类似人类的内容来逃避检测。这一实证分析为恶意和可信llm的设计提供了见解。我们还提出了在设计llm时防止即时操纵和护栏绕过的建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Technology and Society Magazine
IEEE Technology and Society Magazine 工程技术-工程:电子与电气
CiteScore
3.00
自引率
13.60%
发文量
72
审稿时长
>12 weeks
期刊介绍: IEEE Technology and Society Magazine invites feature articles (refereed), special articles, and commentaries on topics within the scope of the IEEE Society on Social Implications of Technology, in the broad areas of social implications of electrotechnology, history of electrotechnology, and engineering ethics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信