Those Aren't Your Memories, They're Somebody Else's: Seeding Misinformation in Chat Bot Memories

Conor Atkins, Benjamin Zi Hao Zhao, H. Asghar, Ian D. Wood, M. Kâafar
{"title":"Those Aren't Your Memories, They're Somebody Else's: Seeding Misinformation in Chat Bot Memories","authors":"Conor Atkins, Benjamin Zi Hao Zhao, H. Asghar, Ian D. Wood, M. Kâafar","doi":"10.48550/arXiv.2304.05371","DOIUrl":null,"url":null,"abstract":"One of the new developments in chit-chat bots is a long-term memory mechanism that remembers information from past conversations for increasing engagement and consistency of responses. The bot is designed to extract knowledge of personal nature from their conversation partner, e.g., stating preference for a particular color. In this paper, we show that this memory mechanism can result in unintended behavior. In particular, we found that one can combine a personal statement with an informative statement that would lead the bot to remember the informative statement alongside personal knowledge in its long term memory. This means that the bot can be tricked into remembering misinformation which it would regurgitate as statements of fact when recalling information relevant to the topic of conversation. We demonstrate this vulnerability on the BlenderBot 2 framework implemented on the ParlAI platform and provide examples on the more recent and significantly larger BlenderBot 3 model. We generate 150 examples of misinformation, of which 114 (76%) were remembered by BlenderBot 2 when combined with a personal statement. We further assessed the risk of this misinformation being recalled after intervening innocuous conversation and in response to multiple questions relevant to the injected memory. Our evaluation was performed on both the memory-only and the combination of memory and internet search modes of BlenderBot 2. From the combinations of these variables, we generated 12,890 conversations and analyzed recalled misinformation in the responses. We found that when the chat bot is questioned on the misinformation topic, it was 328% more likely to respond with the misinformation as fact when the misinformation was in the long-term memory.","PeriodicalId":412384,"journal":{"name":"International Conference on Applied Cryptography and Network Security","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Applied Cryptography and Network Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2304.05371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

One of the new developments in chit-chat bots is a long-term memory mechanism that remembers information from past conversations for increasing engagement and consistency of responses. The bot is designed to extract knowledge of personal nature from their conversation partner, e.g., stating preference for a particular color. In this paper, we show that this memory mechanism can result in unintended behavior. In particular, we found that one can combine a personal statement with an informative statement that would lead the bot to remember the informative statement alongside personal knowledge in its long term memory. This means that the bot can be tricked into remembering misinformation which it would regurgitate as statements of fact when recalling information relevant to the topic of conversation. We demonstrate this vulnerability on the BlenderBot 2 framework implemented on the ParlAI platform and provide examples on the more recent and significantly larger BlenderBot 3 model. We generate 150 examples of misinformation, of which 114 (76%) were remembered by BlenderBot 2 when combined with a personal statement. We further assessed the risk of this misinformation being recalled after intervening innocuous conversation and in response to multiple questions relevant to the injected memory. Our evaluation was performed on both the memory-only and the combination of memory and internet search modes of BlenderBot 2. From the combinations of these variables, we generated 12,890 conversations and analyzed recalled misinformation in the responses. We found that when the chat bot is questioned on the misinformation topic, it was 328% more likely to respond with the misinformation as fact when the misinformation was in the long-term memory.
那些不是你的记忆,它们是别人的:在聊天机器人记忆中植入错误信息
聊天机器人的新发展之一是一种长期记忆机制,它可以记住过去对话中的信息,以提高参与度和反应的一致性。这个机器人被设计用来从他们的谈话对象那里提取个人本性的知识,例如,陈述对特定颜色的偏好。在本文中,我们表明这种记忆机制可能导致意外行为。特别是,我们发现人们可以将个人陈述与信息陈述结合起来,这将导致机器人在长期记忆中记住信息陈述和个人知识。这意味着机器人可以被骗去记住错误的信息,当它回忆与谈话主题相关的信息时,它会把这些信息反刍为事实陈述。我们在ParlAI平台上实现的BlenderBot 2框架上演示了这个漏洞,并在最新的、更大的BlenderBot 3模型上提供了示例。我们生成了150个错误信息的例子,其中114个(76%)与个人陈述结合在一起时被blendbot 2记住了。我们进一步评估了在干预无害的谈话和回答与注入记忆相关的多个问题后,这些错误信息被回忆起来的风险。我们对BlenderBot 2的纯内存模式和内存与互联网搜索相结合的模式进行了评估。从这些变量的组合中,我们生成了12890个对话,并分析了回答中回忆起来的错误信息。我们发现,当聊天机器人被问及有关错误信息的话题时,如果错误信息存在于长期记忆中,那么它回答错误信息的可能性要高出328%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信