Towards Model Extraction Attacks in GAN-Based Image Translation via Domain Shift Mitigation

ArXiv Pub Date : 2024-03-12 DOI:10.1609/aaai.v38i18.29966

Di Mi, Yanjun Zhang, Leo Yu Zhang, Shengshan Hu, Qi Zhong, Haizhuan Yuan, Shirui Pan

{"title":"Towards Model Extraction Attacks in GAN-Based Image Translation via Domain Shift Mitigation","authors":"Di Mi, Yanjun Zhang, Leo Yu Zhang, Shengshan Hu, Qi Zhong, Haizhuan Yuan, Shirui Pan","doi":"10.1609/aaai.v38i18.29966","DOIUrl":null,"url":null,"abstract":"Model extraction attacks (MEAs) enable an attacker to replicate the functionality of a victim deep neural network (DNN) model by only querying its API service remotely, posing a severe threat to the security and integrity of pay-per-query DNN-based services. Although the majority of current research on MEAs has primarily concentrated on neural classifiers, there is a growing prevalence of image-to-image translation (I2IT) tasks in our everyday activities. However, techniques developed for MEA of DNN classifiers cannot be directly transferred to the case of I2IT, rendering the vulnerability of I2IT models to MEA attacks often underestimated. This paper unveils the threat of MEA in I2IT tasks from a new perspective. Diverging from the traditional approach of bridging the distribution gap between attacker queries and victim training samples, we opt to mitigate the effect caused by the different distributions, known as the domain shift. This is achieved by introducing a new regularization term that penalizes high-frequency noise, and seeking a flatter minimum to avoid overfitting to the shifted distribution. Extensive experiments on different image translation tasks, including image super-resolution and style transfer, are performed on different backbone victim models, and the new design consistently outperforms the baseline by a large margin across all metrics. A few real-life I2IT APIs are also verified to be extremely vulnerable to our attack, emphasizing the need for enhanced defenses and potentially revised API publishing policies.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"19 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaai.v38i18.29966","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Model extraction attacks (MEAs) enable an attacker to replicate the functionality of a victim deep neural network (DNN) model by only querying its API service remotely, posing a severe threat to the security and integrity of pay-per-query DNN-based services. Although the majority of current research on MEAs has primarily concentrated on neural classifiers, there is a growing prevalence of image-to-image translation (I2IT) tasks in our everyday activities. However, techniques developed for MEA of DNN classifiers cannot be directly transferred to the case of I2IT, rendering the vulnerability of I2IT models to MEA attacks often underestimated. This paper unveils the threat of MEA in I2IT tasks from a new perspective. Diverging from the traditional approach of bridging the distribution gap between attacker queries and victim training samples, we opt to mitigate the effect caused by the different distributions, known as the domain shift. This is achieved by introducing a new regularization term that penalizes high-frequency noise, and seeking a flatter minimum to avoid overfitting to the shifted distribution. Extensive experiments on different image translation tasks, including image super-resolution and style transfer, are performed on different backbone victim models, and the new design consistently outperforms the baseline by a large margin across all metrics. A few real-life I2IT APIs are also verified to be extremely vulnerable to our attack, emphasizing the need for enhanced defenses and potentially revised API publishing policies.

查看原文本刊更多论文

通过域偏移缓解基于 GAN 的图像翻译中的模型提取攻击

模型提取攻击（MEAs）使攻击者只需远程查询受害深度神经网络（DNN）模型的 API 服务，就能复制该模型的功能，这对基于按查询付费的 DNN 服务的安全性和完整性构成了严重威胁。尽管目前有关 MEA 的大部分研究主要集中在神经分类器上，但在我们的日常活动中，图像到图像翻译（I2IT）任务越来越普遍。然而，为 DNN 分类器开发的 MEA 技术无法直接应用于 I2IT 案例，这使得 I2IT 模型易受 MEA 攻击的程度往往被低估。本文从一个新的角度揭示了 I2IT 任务中的 MEA 威胁。与弥合攻击者查询和受害者训练样本之间分布差距的传统方法不同，我们选择减轻不同分布造成的影响，即所谓的领域偏移。为此，我们引入了一个新的正则化项，用于惩罚高频噪声，并寻求一个更平坦的最小值，以避免对偏移分布的过度拟合。我们在不同的骨干受害者模型上对不同的图像翻译任务（包括图像超分辨率和风格转换）进行了广泛的实验，在所有指标上，新设计都远远优于基线设计。经验证，一些现实生活中的 I2IT 应用程序接口也极易受到我们的攻击，这强调了增强防御和可能修订应用程序接口发布政策的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量