针对半监督学习的数据隐私审查

Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security Pub Date : 2023-07-10 DOI:10.1145/3579856.3590333

Jiadong Lou, Xu Yuan, Miao Pan, Hao Wang, N. Tzeng

{"title":"针对半监督学习的数据隐私审查","authors":"Jiadong Lou, Xu Yuan, Miao Pan, Hao Wang, N. Tzeng","doi":"10.1145/3579856.3590333","DOIUrl":null,"url":null,"abstract":"Semi-supervised learning, which learns with only a small amount of labeled data while collecting voluminous unlabeled data to aid its training, has achieved promising performance lately, but it also raises a serious privacy concern: Whether a user’s data has been collected for use without authorization. In this paper, we propose a novel membership inference method against semi-supervised learning, serving to protect user data privacy. Due to involving both the labeled and unlabeled data, the membership patterns of semi-supervised learning’s training data cannot be well captured by the existing membership inference solutions. To this end, we propose two new metrics, i.e., inter-consistency and intra-entropy, tailored specifically to the semi-supervised learning paradigm, able to respectively measure the similarity and calculate the cross-entropy among prediction vectors from the perturbed versions. By exploiting the two metrics for membership inference, our method can dig out membership patterns imprinted on prediction outputs of semi-supervised learning models, thus facilitating effective membership inference. Extensive experiments have been conducted for comparing our method with five rectified baseline inference techniques across four datasets on six semi-supervised learning algorithms. Experimental results exhibit that our inference method achieves over 80% accuracy under each experimental setting, substantially outperforming all baseline techniques.","PeriodicalId":156082,"journal":{"name":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data Privacy Examination against Semi-Supervised Learning\",\"authors\":\"Jiadong Lou, Xu Yuan, Miao Pan, Hao Wang, N. Tzeng\",\"doi\":\"10.1145/3579856.3590333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semi-supervised learning, which learns with only a small amount of labeled data while collecting voluminous unlabeled data to aid its training, has achieved promising performance lately, but it also raises a serious privacy concern: Whether a user’s data has been collected for use without authorization. In this paper, we propose a novel membership inference method against semi-supervised learning, serving to protect user data privacy. Due to involving both the labeled and unlabeled data, the membership patterns of semi-supervised learning’s training data cannot be well captured by the existing membership inference solutions. To this end, we propose two new metrics, i.e., inter-consistency and intra-entropy, tailored specifically to the semi-supervised learning paradigm, able to respectively measure the similarity and calculate the cross-entropy among prediction vectors from the perturbed versions. By exploiting the two metrics for membership inference, our method can dig out membership patterns imprinted on prediction outputs of semi-supervised learning models, thus facilitating effective membership inference. Extensive experiments have been conducted for comparing our method with five rectified baseline inference techniques across four datasets on six semi-supervised learning algorithms. Experimental results exhibit that our inference method achieves over 80% accuracy under each experimental setting, substantially outperforming all baseline techniques.\",\"PeriodicalId\":156082,\"journal\":{\"name\":\"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3579856.3590333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579856.3590333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

半监督学习仅使用少量标记数据进行学习，同时收集大量未标记数据来辅助训练，最近取得了很好的表现，但它也引发了一个严重的隐私问题:用户的数据是否被未经授权收集使用。在本文中，我们提出了一种新的针对半监督学习的隶属度推理方法，以保护用户数据隐私。由于半监督学习训练数据同时包含有标记和未标记数据，现有的隶属度推理方案无法很好地捕捉到训练数据的隶属度模式。为此，我们提出了专门针对半监督学习范式的两个新指标，即inter-consistency和intra-entropy，它们能够分别测量来自扰动版本的预测向量之间的相似性和计算交叉熵。通过利用这两个指标进行隶属度推理，我们的方法可以挖掘出印在半监督学习模型预测输出上的隶属度模式，从而促进有效的隶属度推理。已经进行了大量的实验，将我们的方法与六种半监督学习算法上跨越四个数据集的五种校正基线推理技术进行了比较。实验结果表明，我们的推理方法在每个实验设置下都达到了80%以上的准确率，大大优于所有基线技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data Privacy Examination against Semi-Supervised Learning

Semi-supervised learning, which learns with only a small amount of labeled data while collecting voluminous unlabeled data to aid its training, has achieved promising performance lately, but it also raises a serious privacy concern: Whether a user’s data has been collected for use without authorization. In this paper, we propose a novel membership inference method against semi-supervised learning, serving to protect user data privacy. Due to involving both the labeled and unlabeled data, the membership patterns of semi-supervised learning’s training data cannot be well captured by the existing membership inference solutions. To this end, we propose two new metrics, i.e., inter-consistency and intra-entropy, tailored specifically to the semi-supervised learning paradigm, able to respectively measure the similarity and calculate the cross-entropy among prediction vectors from the perturbed versions. By exploiting the two metrics for membership inference, our method can dig out membership patterns imprinted on prediction outputs of semi-supervised learning models, thus facilitating effective membership inference. Extensive experiments have been conducted for comparing our method with five rectified baseline inference techniques across four datasets on six semi-supervised learning algorithms. Experimental results exhibit that our inference method achieves over 80% accuracy under each experimental setting, substantially outperforming all baseline techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security

自引率

0.00%

发文量