PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps.

Ruixuan Liu, Tianhao Wang, Yang Cao, Li Xiong
{"title":"PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps.","authors":"Ruixuan Liu, Tianhao Wang, Yang Cao, Li Xiong","doi":"10.1145/3658644.3690279","DOIUrl":null,"url":null,"abstract":"<p><p>The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation processes. However, a released pre-trained model can be a privacy trap for fine-tuning datasets if it is carefully designed. In this work, we propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model. PreCurious aims to escalate the general privacy risk of both membership inference and data extraction on the fine-tuning dataset. The key intuition behind PreCurious is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. While empirical and theoretical evidence suggests that parameter-efficient and differentially private fine-tuning techniques can defend against privacy attacks on a fine-tuned model, PreCurious demonstrates the possibility of breaking up this invulnerability in a stealthy manner compared to fine-tuning on a benign pre-trained model. While DP provides some mitigation for membership inference attack, by further leveraging a sanitized dataset, PreCurious demonstrates potential vulnerabilities for targeted data extraction even under differentially private tuning with a strict privacy budget e.g. <math><mi>ϵ</mi> <mo>=</mo> <mn>0.05</mn></math> . Thus, PreCurious raises warnings for users on the potential risks of downloading pre-trained models from unknown sources, relying solely on tutorials or common-sense defenses, and releasing sanitized datasets even after perfect scrubbing.</p>","PeriodicalId":72687,"journal":{"name":"Conference on Computer and Communications Security : proceedings of the ... conference on computer and communications security. ACM Conference on Computer and Communications Security","volume":"2024 ","pages":"3511-3524"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12094715/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Computer and Communications Security : proceedings of the ... conference on computer and communications security. ACM Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3658644.3690279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/9 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation processes. However, a released pre-trained model can be a privacy trap for fine-tuning datasets if it is carefully designed. In this work, we propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model. PreCurious aims to escalate the general privacy risk of both membership inference and data extraction on the fine-tuning dataset. The key intuition behind PreCurious is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. While empirical and theoretical evidence suggests that parameter-efficient and differentially private fine-tuning techniques can defend against privacy attacks on a fine-tuned model, PreCurious demonstrates the possibility of breaking up this invulnerability in a stealthy manner compared to fine-tuning on a benign pre-trained model. While DP provides some mitigation for membership inference attack, by further leveraging a sanitized dataset, PreCurious demonstrates potential vulnerabilities for targeted data extraction even under differentially private tuning with a strict privacy budget e.g. ϵ = 0.05 . Thus, PreCurious raises warnings for users on the potential risks of downloading pre-trained models from unknown sources, relying solely on tutorials or common-sense defenses, and releasing sanitized datasets even after perfect scrubbing.

PreCurious:无辜的预训练语言模型如何变成隐私陷阱。
预训练和微调范式已经证明了其有效性,并已成为定制语言模型以适应各种任务的标准方法。目前,基于社区的平台提供了对各种预训练模型的轻松访问,因为任何人都可以在没有严格验证过程的情况下发布。然而,如果经过精心设计,发布的预训练模型可能会成为微调数据集的隐私陷阱。在这项工作中,我们提出了PreCurious框架来揭示新的攻击面,攻击者释放预训练的模型,并获得对最终微调模型的黑盒访问。PreCurious的目标是在微调数据集上提升成员推理和数据提取的一般隐私风险。PreCurious背后的关键直觉是操纵预训练模型的记忆阶段,并用看似合理的配置指导微调。虽然经验和理论证据表明,参数高效和差异私有的微调技术可以在微调模型上防御隐私攻击,但PreCurious展示了与在良性预训练模型上进行微调相比,以一种隐蔽的方式打破这种无懈可击的可能性。虽然DP为成员推理攻击提供了一些缓解,但通过进一步利用经过消毒的数据集,PreCurious展示了目标数据提取的潜在漏洞,即使在严格的隐私预算(例如ε = 0.05)下进行差异私有调优。因此,PreCurious向用户发出了警告,提醒他们从未知来源下载预训练模型的潜在风险,仅仅依赖教程或常识防御,即使在完美清洗之后也会发布经过消毒的数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.20
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信