Privacy Risks of General-Purpose Language Models

Xudong Pan, Mi Zhang, S. Ji, Min Yang
{"title":"Privacy Risks of General-Purpose Language Models","authors":"Xudong Pan, Mi Zhang, S. Ji, Min Yang","doi":"10.1109/SP40000.2020.00095","DOIUrl":null,"url":null,"abstract":"Recently, a new paradigm of building general-purpose language models (e.g., Google’s Bert and OpenAI’s GPT-2) in Natural Language Processing (NLP) for text feature extraction, a standard procedure in NLP systems that converts texts to vectors (i.e., embeddings) for downstream modeling, has arisen and starts to find its application in various downstream NLP tasks and real world systems (e.g., Google’s search engine [6]). To obtain general-purpose text embeddings, these language models have highly complicated architectures with millions of learnable parameters and are usually pretrained on billions of sentences before being utilized. As is widely recognized, such a practice indeed improves the state-of-the-art performance of many downstream NLP tasks. However, the improved utility is not for free. We find the text embeddings from general-purpose language models would capture much sensitive information from the plain text. Once being accessed by the adversary, the embeddings can be reverse-engineered to disclose sensitive information of the victims for further harassment. Although such a privacy risk can impose a real threat to the future leverage of these promising NLP tools, there are neither published attacks nor systematic evaluations by far for the mainstream industry-level language models. To bridge this gap, we present the first systematic study on the privacy risks of 8 state-of-the-art language models with 4 diverse case studies. By constructing 2 novel attack classes, our study demonstrates the aforementioned privacy risks do exist and can impose practical threats to the application of general-purpose language models on sensitive data covering identity, genome, healthcare and location. For example, we show the adversary with nearly no prior knowledge can achieve about 75% accuracy when inferring the precise disease site from Bert embeddings of patients’ medical descriptions. As possible countermeasures, we propose 4 different defenses (via rounding, differential privacy, adversarial training and subspace projection) to obfuscate the unprotected embeddings for mitigation purpose. With extensive evaluations, we also provide a preliminary analysis on the utility-privacy trade-off brought by each defense, which we hope may foster future mitigation researches.","PeriodicalId":6849,"journal":{"name":"2020 IEEE Symposium on Security and Privacy (SP)","volume":"26 1","pages":"1314-1331"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"98","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP40000.2020.00095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 98

Abstract

Recently, a new paradigm of building general-purpose language models (e.g., Google’s Bert and OpenAI’s GPT-2) in Natural Language Processing (NLP) for text feature extraction, a standard procedure in NLP systems that converts texts to vectors (i.e., embeddings) for downstream modeling, has arisen and starts to find its application in various downstream NLP tasks and real world systems (e.g., Google’s search engine [6]). To obtain general-purpose text embeddings, these language models have highly complicated architectures with millions of learnable parameters and are usually pretrained on billions of sentences before being utilized. As is widely recognized, such a practice indeed improves the state-of-the-art performance of many downstream NLP tasks. However, the improved utility is not for free. We find the text embeddings from general-purpose language models would capture much sensitive information from the plain text. Once being accessed by the adversary, the embeddings can be reverse-engineered to disclose sensitive information of the victims for further harassment. Although such a privacy risk can impose a real threat to the future leverage of these promising NLP tools, there are neither published attacks nor systematic evaluations by far for the mainstream industry-level language models. To bridge this gap, we present the first systematic study on the privacy risks of 8 state-of-the-art language models with 4 diverse case studies. By constructing 2 novel attack classes, our study demonstrates the aforementioned privacy risks do exist and can impose practical threats to the application of general-purpose language models on sensitive data covering identity, genome, healthcare and location. For example, we show the adversary with nearly no prior knowledge can achieve about 75% accuracy when inferring the precise disease site from Bert embeddings of patients’ medical descriptions. As possible countermeasures, we propose 4 different defenses (via rounding, differential privacy, adversarial training and subspace projection) to obfuscate the unprotected embeddings for mitigation purpose. With extensive evaluations, we also provide a preliminary analysis on the utility-privacy trade-off brought by each defense, which we hope may foster future mitigation researches.
通用语言模型的隐私风险
最近,在自然语言处理(NLP)中用于文本特征提取的构建通用语言模型的新范式(例如Google的Bert和OpenAI的GPT-2)已经出现,并开始在各种下游NLP任务和现实世界系统(例如Google的搜索引擎[6])中找到其应用。文本特征提取是NLP系统中将文本转换为向量(即嵌入)进行下游建模的标准程序。为了获得通用的文本嵌入,这些语言模型具有高度复杂的体系结构,具有数百万个可学习的参数,并且在使用之前通常需要对数十亿个句子进行预训练。众所周知,这样的实践确实提高了许多下游NLP任务的最先进性能。然而,改进的实用程序并不是免费的。我们发现,通用语言模型的文本嵌入可以从纯文本中捕获许多敏感信息。一旦被攻击者访问,就可以对嵌入进行反向工程,以泄露受害者的敏感信息,以便进一步骚扰。尽管这样的隐私风险可能会对这些有前途的NLP工具的未来杠杆作用造成真正的威胁,但到目前为止,主流行业级语言模型既没有公开的攻击,也没有系统的评估。为了弥补这一差距,我们首次对8种最先进的语言模型的隐私风险进行了系统研究,并进行了4个不同的案例研究。通过构建两个新的攻击类,我们的研究证明了上述隐私风险确实存在,并且可以对通用语言模型在敏感数据(包括身份、基因组、医疗保健和位置)上的应用施加实际威胁。例如,我们展示了几乎没有先验知识的对手在从患者医学描述的Bert嵌入推断精确的疾病位置时可以达到约75%的准确率。作为可能的对策,我们提出了4种不同的防御(通过舍入、差分隐私、对抗性训练和子空间投影)来混淆未受保护的嵌入,以达到缓解目的。通过广泛的评估,我们还对每种防御带来的效用-隐私权衡进行了初步分析,我们希望这可以促进未来的缓解研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信