论基于后门的模型水印的弱点：信息论视角

arXiv - CS - Cryptography and Security Pub Date : 2024-09-10 DOI:arxiv-2409.06130

Aoting Hu, Yanzhi Chen, Renjie Xie, Adrian Weller

{"title":"论基于后门的模型水印的弱点：信息论视角","authors":"Aoting Hu, Yanzhi Chen, Renjie Xie, Adrian Weller","doi":"arxiv-2409.06130","DOIUrl":null,"url":null,"abstract":"Safeguarding the intellectual property of machine learning models has emerged\nas a pressing concern in AI security. Model watermarking is a powerful\ntechnique for protecting ownership of machine learning models, yet its\nreliability has been recently challenged by recent watermark removal attacks.\nIn this work, we investigate why existing watermark embedding techniques\nparticularly those based on backdooring are vulnerable. Through an\ninformation-theoretic analysis, we show that the resilience of watermarking\nagainst erasure attacks hinges on the choice of trigger-set samples, where\ncurrent uses of out-distribution trigger-set are inherently vulnerable to\nwhite-box adversaries. Based on this discovery, we propose a novel model\nwatermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the\nlimitations of existing method. To further minimise the gap to clean models, we\nanalyze the role of logits as watermark information carriers and propose a new\napproach to better conceal watermark information within the logits. Experiments\non real-world datasets including CIFAR-100 and Caltech-101 demonstrate that our\nmethod robustly defends against various adversaries with negligible accuracy\nloss (< 0.1%).","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective\",\"authors\":\"Aoting Hu, Yanzhi Chen, Renjie Xie, Adrian Weller\",\"doi\":\"arxiv-2409.06130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Safeguarding the intellectual property of machine learning models has emerged\\nas a pressing concern in AI security. Model watermarking is a powerful\\ntechnique for protecting ownership of machine learning models, yet its\\nreliability has been recently challenged by recent watermark removal attacks.\\nIn this work, we investigate why existing watermark embedding techniques\\nparticularly those based on backdooring are vulnerable. Through an\\ninformation-theoretic analysis, we show that the resilience of watermarking\\nagainst erasure attacks hinges on the choice of trigger-set samples, where\\ncurrent uses of out-distribution trigger-set are inherently vulnerable to\\nwhite-box adversaries. Based on this discovery, we propose a novel model\\nwatermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the\\nlimitations of existing method. To further minimise the gap to clean models, we\\nanalyze the role of logits as watermark information carriers and propose a new\\napproach to better conceal watermark information within the logits. Experiments\\non real-world datasets including CIFAR-100 and Caltech-101 demonstrate that our\\nmethod robustly defends against various adversaries with negligible accuracy\\nloss (< 0.1%).\",\"PeriodicalId\":501332,\"journal\":{\"name\":\"arXiv - CS - Cryptography and Security\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Cryptography and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

保护机器学习模型的知识产权已成为人工智能安全领域亟待解决的问题。模型水印是一种保护机器学习模型所有权的强大技术，但其可靠性最近受到了近期水印清除攻击的挑战。在这项工作中，我们研究了现有水印嵌入技术（尤其是基于反向删除的技术）易受攻击的原因。通过信息理论分析，我们发现水印技术抵御擦除攻击的能力取决于触发集样本的选择，而目前使用的分布外触发集在本质上容易受到白盒对手的攻击。基于这一发现，我们提出了一种新颖的模型水印方案--分布内水印嵌入（IWE），以克服现有方法的局限性。为了进一步缩小与干净模型的差距，我们分析了对数作为水印信息载体的作用，并提出了一种在对数中更好地隐藏水印信息的新方法。在 CIFAR-100 和 Caltech-101 等真实数据集上进行的实验表明，我们的方法能够稳健地抵御各种对手的攻击，精确度损失几乎可以忽略不计（< 0.1%）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective

Safeguarding the intellectual property of machine learning models has emerged as a pressing concern in AI security. Model watermarking is a powerful technique for protecting ownership of machine learning models, yet its reliability has been recently challenged by recent watermark removal attacks. In this work, we investigate why existing watermark embedding techniques particularly those based on backdooring are vulnerable. Through an information-theoretic analysis, we show that the resilience of watermarking against erasure attacks hinges on the choice of trigger-set samples, where current uses of out-distribution trigger-set are inherently vulnerable to white-box adversaries. Based on this discovery, we propose a novel model watermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the limitations of existing method. To further minimise the gap to clean models, we analyze the role of logits as watermark information carriers and propose a new approach to better conceal watermark information within the logits. Experiments on real-world datasets including CIFAR-100 and Caltech-101 demonstrate that our method robustly defends against various adversaries with negligible accuracy loss (< 0.1%).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Cryptography and Security

自引率

0.00%

发文量