Zijun Lin, Ke Xu, Chengfang Fang, Huadi Zheng, Aneez Ahmed Jaheezuddin, Jie Shi
{"title":"QUDA: Query-Limited Data-Free Model Extraction","authors":"Zijun Lin, Ke Xu, Chengfang Fang, Huadi Zheng, Aneez Ahmed Jaheezuddin, Jie Shi","doi":"10.1145/3579856.3590336","DOIUrl":null,"url":null,"abstract":"Model extraction attack typically refers to extracting non-public information from a black-box machine learning model. Its unauthorized nature poses significant threat to intellectual property rights of the model owners. By using the well-designed queries and the predictions returned from the victim model, the adversary is able to train a clone model from scratch to obtain similar functionality as victim model. Recently, some methods have been proposed to perform model extraction attacks without using any in-distribution data (Data-free setting). Although these methods have been shown to achieve high clone accuracy, their query budgets are typically around 10 million or even exceed 20 million in some datasets, which lead to a high cost of model stealing and can be easily defended by limiting the number of queries. To illustrate the severe threats induced by model extraction attacks with limited query budget in realistic scenarios, we propose QUDA – a novel QUey-limited DAta-free model extraction attack that incorporates GAN pre-trained by public unrelated dataset to provide weak image prior and the technique of deep reinforcement learning to make query generation strategy more efficient. Compared with the state-of-the-art data-free model extraction method, QUDA achieves better results under query-limited condition (0.1M query budget) in FMNIST and CIFAR-10 datasets, and even outperforms the baseline method in most cases when QUDA uses only 10% query budget of its. QUDA issued a warning that solely relying on the limited numbers of queries or the confidentiality of training data is not reliable to protect model’s security and privacy. Potential countermeasures, such as detection-based defense approach, are also provided.","PeriodicalId":156082,"journal":{"name":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579856.3590336","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Model extraction attack typically refers to extracting non-public information from a black-box machine learning model. Its unauthorized nature poses significant threat to intellectual property rights of the model owners. By using the well-designed queries and the predictions returned from the victim model, the adversary is able to train a clone model from scratch to obtain similar functionality as victim model. Recently, some methods have been proposed to perform model extraction attacks without using any in-distribution data (Data-free setting). Although these methods have been shown to achieve high clone accuracy, their query budgets are typically around 10 million or even exceed 20 million in some datasets, which lead to a high cost of model stealing and can be easily defended by limiting the number of queries. To illustrate the severe threats induced by model extraction attacks with limited query budget in realistic scenarios, we propose QUDA – a novel QUey-limited DAta-free model extraction attack that incorporates GAN pre-trained by public unrelated dataset to provide weak image prior and the technique of deep reinforcement learning to make query generation strategy more efficient. Compared with the state-of-the-art data-free model extraction method, QUDA achieves better results under query-limited condition (0.1M query budget) in FMNIST and CIFAR-10 datasets, and even outperforms the baseline method in most cases when QUDA uses only 10% query budget of its. QUDA issued a warning that solely relying on the limited numbers of queries or the confidentiality of training data is not reliable to protect model’s security and privacy. Potential countermeasures, such as detection-based defense approach, are also provided.