{"title":"论基于后门的模型水印的弱点:信息论视角","authors":"Aoting Hu, Yanzhi Chen, Renjie Xie, Adrian Weller","doi":"arxiv-2409.06130","DOIUrl":null,"url":null,"abstract":"Safeguarding the intellectual property of machine learning models has emerged\nas a pressing concern in AI security. Model watermarking is a powerful\ntechnique for protecting ownership of machine learning models, yet its\nreliability has been recently challenged by recent watermark removal attacks.\nIn this work, we investigate why existing watermark embedding techniques\nparticularly those based on backdooring are vulnerable. Through an\ninformation-theoretic analysis, we show that the resilience of watermarking\nagainst erasure attacks hinges on the choice of trigger-set samples, where\ncurrent uses of out-distribution trigger-set are inherently vulnerable to\nwhite-box adversaries. Based on this discovery, we propose a novel model\nwatermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the\nlimitations of existing method. To further minimise the gap to clean models, we\nanalyze the role of logits as watermark information carriers and propose a new\napproach to better conceal watermark information within the logits. Experiments\non real-world datasets including CIFAR-100 and Caltech-101 demonstrate that our\nmethod robustly defends against various adversaries with negligible accuracy\nloss (< 0.1%).","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective\",\"authors\":\"Aoting Hu, Yanzhi Chen, Renjie Xie, Adrian Weller\",\"doi\":\"arxiv-2409.06130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Safeguarding the intellectual property of machine learning models has emerged\\nas a pressing concern in AI security. Model watermarking is a powerful\\ntechnique for protecting ownership of machine learning models, yet its\\nreliability has been recently challenged by recent watermark removal attacks.\\nIn this work, we investigate why existing watermark embedding techniques\\nparticularly those based on backdooring are vulnerable. Through an\\ninformation-theoretic analysis, we show that the resilience of watermarking\\nagainst erasure attacks hinges on the choice of trigger-set samples, where\\ncurrent uses of out-distribution trigger-set are inherently vulnerable to\\nwhite-box adversaries. Based on this discovery, we propose a novel model\\nwatermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the\\nlimitations of existing method. To further minimise the gap to clean models, we\\nanalyze the role of logits as watermark information carriers and propose a new\\napproach to better conceal watermark information within the logits. Experiments\\non real-world datasets including CIFAR-100 and Caltech-101 demonstrate that our\\nmethod robustly defends against various adversaries with negligible accuracy\\nloss (< 0.1%).\",\"PeriodicalId\":501332,\"journal\":{\"name\":\"arXiv - CS - Cryptography and Security\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Cryptography and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective
Safeguarding the intellectual property of machine learning models has emerged
as a pressing concern in AI security. Model watermarking is a powerful
technique for protecting ownership of machine learning models, yet its
reliability has been recently challenged by recent watermark removal attacks.
In this work, we investigate why existing watermark embedding techniques
particularly those based on backdooring are vulnerable. Through an
information-theoretic analysis, we show that the resilience of watermarking
against erasure attacks hinges on the choice of trigger-set samples, where
current uses of out-distribution trigger-set are inherently vulnerable to
white-box adversaries. Based on this discovery, we propose a novel model
watermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the
limitations of existing method. To further minimise the gap to clean models, we
analyze the role of logits as watermark information carriers and propose a new
approach to better conceal watermark information within the logits. Experiments
on real-world datasets including CIFAR-100 and Caltech-101 demonstrate that our
method robustly defends against various adversaries with negligible accuracy
loss (< 0.1%).