PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps.

Conference on Computer and Communications Security : proceedings of the ... conference on computer and communications security. ACM Conference on Computer and Communications Security Pub Date : 2024-10-01 Epub Date: 2024-12-09 DOI:10.1145/3658644.3690279

Ruixuan Liu, Tianhao Wang, Yang Cao, Li Xiong

{"title":"PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps.","authors":"Ruixuan Liu, Tianhao Wang, Yang Cao, Li Xiong","doi":"10.1145/3658644.3690279","DOIUrl":null,"url":null,"abstract":"<p><p>The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation processes. However, a released pre-trained model can be a privacy trap for fine-tuning datasets if it is carefully designed. In this work, we propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model. PreCurious aims to escalate the general privacy risk of both membership inference and data extraction on the fine-tuning dataset. The key intuition behind PreCurious is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. While empirical and theoretical evidence suggests that parameter-efficient and differentially private fine-tuning techniques can defend against privacy attacks on a fine-tuned model, PreCurious demonstrates the possibility of breaking up this invulnerability in a stealthy manner compared to fine-tuning on a benign pre-trained model. While DP provides some mitigation for membership inference attack, by further leveraging a sanitized dataset, PreCurious demonstrates potential vulnerabilities for targeted data extraction even under differentially private tuning with a strict privacy budget e.g. <math><mi>ϵ</mi> <mo>=</mo> <mn>0.05</mn></math> . Thus, PreCurious raises warnings for users on the potential risks of downloading pre-trained models from unknown sources, relying solely on tutorials or common-sense defenses, and releasing sanitized datasets even after perfect scrubbing.</p>","PeriodicalId":72687,"journal":{"name":"Conference on Computer and Communications Security : proceedings of the ... conference on computer and communications security. ACM Conference on Computer and Communications Security","volume":"2024 ","pages":"3511-3524"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12094715/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Computer and Communications Security : proceedings of the ... conference on computer and communications security. ACM Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3658644.3690279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/9 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation processes. However, a released pre-trained model can be a privacy trap for fine-tuning datasets if it is carefully designed. In this work, we propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model. PreCurious aims to escalate the general privacy risk of both membership inference and data extraction on the fine-tuning dataset. The key intuition behind PreCurious is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. While empirical and theoretical evidence suggests that parameter-efficient and differentially private fine-tuning techniques can defend against privacy attacks on a fine-tuned model, PreCurious demonstrates the possibility of breaking up this invulnerability in a stealthy manner compared to fine-tuning on a benign pre-trained model. While DP provides some mitigation for membership inference attack, by further leveraging a sanitized dataset, PreCurious demonstrates potential vulnerabilities for targeted data extraction even under differentially private tuning with a strict privacy budget e.g. $ϵ = 0.05$ . Thus, PreCurious raises warnings for users on the potential risks of downloading pre-trained models from unknown sources, relying solely on tutorials or common-sense defenses, and releasing sanitized datasets even after perfect scrubbing.

查看原文本刊更多论文

PreCurious：无辜的预训练语言模型如何变成隐私陷阱。

预训练和微调范式已经证明了其有效性，并已成为定制语言模型以适应各种任务的标准方法。目前，基于社区的平台提供了对各种预训练模型的轻松访问，因为任何人都可以在没有严格验证过程的情况下发布。然而，如果经过精心设计，发布的预训练模型可能会成为微调数据集的隐私陷阱。在这项工作中，我们提出了PreCurious框架来揭示新的攻击面，攻击者释放预训练的模型，并获得对最终微调模型的黑盒访问。PreCurious的目标是在微调数据集上提升成员推理和数据提取的一般隐私风险。PreCurious背后的关键直觉是操纵预训练模型的记忆阶段，并用看似合理的配置指导微调。虽然经验和理论证据表明，参数高效和差异私有的微调技术可以在微调模型上防御隐私攻击，但PreCurious展示了与在良性预训练模型上进行微调相比，以一种隐蔽的方式打破这种无懈可击的可能性。虽然DP为成员推理攻击提供了一些缓解，但通过进一步利用经过消毒的数据集，PreCurious展示了目标数据提取的潜在漏洞，即使在严格的隐私预算（例如ε = 0.05）下进行差异私有调优。因此，PreCurious向用户发出了警告，提醒他们从未知来源下载预训练模型的潜在风险，仅仅依赖教程或常识防御，即使在完美清洗之后也会发布经过消毒的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Conference on Computer and Communications Security : proceedings of the ... conference on computer and communications security. ACM Conference on Computer and Communications Security

CiteScore

9.20

自引率

0.00%

发文量