Catch Me if You Can: Detecting Unauthorized Data Use in Deep Learning Models

arXiv - CS - Cryptography and Security Pub Date : 2024-09-10 DOI:arxiv-2409.06280

Zitao Chen, Karthik Pattabiraman

{"title":"Catch Me if You Can: Detecting Unauthorized Data Use in Deep Learning Models","authors":"Zitao Chen, Karthik Pattabiraman","doi":"arxiv-2409.06280","DOIUrl":null,"url":null,"abstract":"The rise of deep learning (DL) has led to a surging demand for training data,\nwhich incentivizes the creators of DL models to trawl through the Internet for\ntraining materials. Meanwhile, users often have limited control over whether\ntheir data (e.g., facial images) are used to train DL models without their\nconsent, which has engendered pressing concerns. This work proposes MembershipTracker, a practical data provenance tool that\ncan empower ordinary users to take agency in detecting the unauthorized use of\ntheir data in training DL models. We view tracing data provenance through the\nlens of membership inference (MI). MembershipTracker consists of a lightweight\ndata marking component to mark the target data with small and targeted changes,\nwhich can be strongly memorized by the model trained on them; and a specialized\nMI-based verification process to audit whether the model exhibits strong\nmemorization on the target samples. Overall, MembershipTracker only requires the users to mark a small fraction\nof data (0.005% to 0.1% in proportion to the training set), and it enables the\nusers to reliably detect the unauthorized use of their data (average 0%\nFPR@100% TPR). We show that MembershipTracker is highly effective across\nvarious settings, including industry-scale training on the full-size\nImageNet-1k dataset. We finally evaluate MembershipTracker under multiple\nclasses of countermeasures.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"7 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The rise of deep learning (DL) has led to a surging demand for training data, which incentivizes the creators of DL models to trawl through the Internet for training materials. Meanwhile, users often have limited control over whether their data (e.g., facial images) are used to train DL models without their consent, which has engendered pressing concerns. This work proposes MembershipTracker, a practical data provenance tool that can empower ordinary users to take agency in detecting the unauthorized use of their data in training DL models. We view tracing data provenance through the lens of membership inference (MI). MembershipTracker consists of a lightweight data marking component to mark the target data with small and targeted changes, which can be strongly memorized by the model trained on them; and a specialized MI-based verification process to audit whether the model exhibits strong memorization on the target samples. Overall, MembershipTracker only requires the users to mark a small fraction of data (0.005% to 0.1% in proportion to the training set), and it enables the users to reliably detect the unauthorized use of their data (average 0% FPR@100% TPR). We show that MembershipTracker is highly effective across various settings, including industry-scale training on the full-size ImageNet-1k dataset. We finally evaluate MembershipTracker under multiple classes of countermeasures.

查看原文本刊更多论文

有本事来抓我：检测深度学习模型中未经授权的数据使用

深度学习（DL）的兴起导致对训练数据的需求激增，这刺激了 DL 模型的创建者在互联网上搜索训练材料。与此同时，用户对自己的数据（如面部图像）是否在未经自己同意的情况下被用于训练深度学习模型的控制权往往很有限，这引起了人们的迫切关注。这项工作提出了一个实用的数据出处工具--MembershipTracker，它能让普通用户有能力检测自己的数据是否在未经授权的情况下被用于训练 DL 模型。我们从成员推理（MI）的角度来看待数据来源追踪。MembershipTracker由一个轻量级数据标记组件和一个专门的基于MI的验证流程组成，前者用于标记目标数据的细小且有针对性的变化，后者用于审核模型在目标样本上是否表现出很强的记忆能力。总体而言，MembershipTracker 只需要用户标记一小部分数据（占训练集的 0.005% 到 0.1%），就能让用户可靠地检测到未经授权使用其数据的情况（平均 0%FPR@100% TPR）。我们的研究表明，MembershipTracker 在各种环境下都非常有效，包括在全尺寸的 ImageNet-1k 数据集上进行行业规模的训练。最后，我们评估了多类反措施下的 MembershipTracker。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Cryptography and Security

自引率

0.00%

发文量