基于内在特征的可解释深度神经网络知识产权保护方法

Mingfu Xue;Xin Wang;Yinghao Wu;Shifeng Ni;Leo Yu Zhang;Yushu Zhang;Weiqiang Liu
{"title":"基于内在特征的可解释深度神经网络知识产权保护方法","authors":"Mingfu Xue;Xin Wang;Yinghao Wu;Shifeng Ni;Leo Yu Zhang;Yushu Zhang;Weiqiang Liu","doi":"10.1109/TAI.2024.3388389","DOIUrl":null,"url":null,"abstract":"Intellectual property (IP) protection for deep neural networks (DNNs) has raised serious concerns in recent years. Most existing works embed watermarks in the DNN model for IP protection, which need to modify the model and do not consider/mention interpretability. In this article, for the first time, we propose an interpretable IP protection method for DNN based on explainable artificial intelligence. Compared with existing works, the proposed method does not modify the DNN model, and the decision of the ownership verification is interpretable. We extract the intrinsic features of the DNN model by using deep Taylor decomposition. Since the intrinsic feature is composed of unique interpretation of the model's decision, the intrinsic feature can be regarded as fingerprint of the model. If the fingerprint of a suspected model is the same as the original model, the suspected model is considered as a pirated model. Experimental results demonstrate that the fingerprints can be successfully used to verify the ownership of the model and the test accuracy of the model is not affected. Furthermore, the proposed method is robust to fine-tuning attack, pruning attack, watermark overwriting attack, and adaptive attack.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Explainable Intellectual Property Protection Method for Deep Neural Networks Based on Intrinsic Features\",\"authors\":\"Mingfu Xue;Xin Wang;Yinghao Wu;Shifeng Ni;Leo Yu Zhang;Yushu Zhang;Weiqiang Liu\",\"doi\":\"10.1109/TAI.2024.3388389\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Intellectual property (IP) protection for deep neural networks (DNNs) has raised serious concerns in recent years. Most existing works embed watermarks in the DNN model for IP protection, which need to modify the model and do not consider/mention interpretability. In this article, for the first time, we propose an interpretable IP protection method for DNN based on explainable artificial intelligence. Compared with existing works, the proposed method does not modify the DNN model, and the decision of the ownership verification is interpretable. We extract the intrinsic features of the DNN model by using deep Taylor decomposition. Since the intrinsic feature is composed of unique interpretation of the model's decision, the intrinsic feature can be regarded as fingerprint of the model. If the fingerprint of a suspected model is the same as the original model, the suspected model is considered as a pirated model. Experimental results demonstrate that the fingerprints can be successfully used to verify the ownership of the model and the test accuracy of the model is not affected. Furthermore, the proposed method is robust to fine-tuning attack, pruning attack, watermark overwriting attack, and adaptive attack.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10500746/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10500746/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,深度神经网络(DNN)的知识产权(IP)保护引起了人们的严重关注。现有研究大多在 DNN 模型中嵌入水印进行知识产权保护,这需要修改模型,且没有考虑/提及可解释性。本文首次提出了一种基于可解释人工智能的 DNN 可解释知识产权保护方法。与现有方法相比,本文提出的方法不需要修改 DNN 模型,而且所有权验证的决定是可解释的。我们利用深度泰勒分解法提取 DNN 模型的内在特征。由于内在特征是由对模型判定的唯一解释组成的,因此内在特征可视为模型的指纹。如果可疑模型的指纹与原始模型相同,则该可疑模型被视为盗版模型。实验结果表明,指纹可成功用于验证模型的所有权,模型的测试准确性不受影响。此外,所提出的方法对微调攻击、剪枝攻击、水印覆盖攻击和自适应攻击具有鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Explainable Intellectual Property Protection Method for Deep Neural Networks Based on Intrinsic Features
Intellectual property (IP) protection for deep neural networks (DNNs) has raised serious concerns in recent years. Most existing works embed watermarks in the DNN model for IP protection, which need to modify the model and do not consider/mention interpretability. In this article, for the first time, we propose an interpretable IP protection method for DNN based on explainable artificial intelligence. Compared with existing works, the proposed method does not modify the DNN model, and the decision of the ownership verification is interpretable. We extract the intrinsic features of the DNN model by using deep Taylor decomposition. Since the intrinsic feature is composed of unique interpretation of the model's decision, the intrinsic feature can be regarded as fingerprint of the model. If the fingerprint of a suspected model is the same as the original model, the suspected model is considered as a pirated model. Experimental results demonstrate that the fingerprints can be successfully used to verify the ownership of the model and the test accuracy of the model is not affected. Furthermore, the proposed method is robust to fine-tuning attack, pruning attack, watermark overwriting attack, and adaptive attack.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信