具有已知类先验的正和无标记学习的差分隐私

2018 IEEE Statistical Signal Processing Workshop (SSP) Pub Date : 2018-06-01 DOI:10.1109/SSP.2018.8450839

Anh T. Pham, R. Raich

{"title":"具有已知类先验的正和无标记学习的差分隐私","authors":"Anh T. Pham, R. Raich","doi":"10.1109/SSP.2018.8450839","DOIUrl":null,"url":null,"abstract":"Despite the increasing attention to big data, there are several domains where labeled data is scarce or too costly to obtain. For example, for data from information retrieval, gene analysis, and social network analysis, only training samples from the positive class are annotated while the remaining unlabeled training samples consist of both unlabeled positive and unlabeled negative samples. The specific positive and unlabeled (PU) data from those domains necessitates a mechanism to learn a two-class classifier from only one-class labeled data. Moreover, because data from those domains is highly sensitive and private, preserving training samples privacy is essential. This paper addresses the challenge of private PU learning by designing a differentially private algorithm for positive and unlabeled data. We first propose a learning framework for the PU setting when the class prior probability is known, with a theoretical guarantee of convergence to the optimal classifier. We then propose a privacy-preserving mechanism for the designed framework where the privacy and utility are both theoretically and empirically proved.","PeriodicalId":330528,"journal":{"name":"2018 IEEE Statistical Signal Processing Workshop (SSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Differential Privacy for Positive and Unlabeled Learning With Known Class Priors\",\"authors\":\"Anh T. Pham, R. Raich\",\"doi\":\"10.1109/SSP.2018.8450839\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the increasing attention to big data, there are several domains where labeled data is scarce or too costly to obtain. For example, for data from information retrieval, gene analysis, and social network analysis, only training samples from the positive class are annotated while the remaining unlabeled training samples consist of both unlabeled positive and unlabeled negative samples. The specific positive and unlabeled (PU) data from those domains necessitates a mechanism to learn a two-class classifier from only one-class labeled data. Moreover, because data from those domains is highly sensitive and private, preserving training samples privacy is essential. This paper addresses the challenge of private PU learning by designing a differentially private algorithm for positive and unlabeled data. We first propose a learning framework for the PU setting when the class prior probability is known, with a theoretical guarantee of convergence to the optimal classifier. We then propose a privacy-preserving mechanism for the designed framework where the privacy and utility are both theoretically and empirically proved.\",\"PeriodicalId\":330528,\"journal\":{\"name\":\"2018 IEEE Statistical Signal Processing Workshop (SSP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Statistical Signal Processing Workshop (SSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSP.2018.8450839\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Statistical Signal Processing Workshop (SSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSP.2018.8450839","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

尽管对大数据的关注越来越多，但仍有几个领域标记数据稀缺或获取成本过高。例如，对于来自信息检索、基因分析和社会网络分析的数据，只有来自正类的训练样本被注释，而其余未标记的训练样本包括未标记的正样本和未标记的负样本。来自这些领域的特定正和未标记(PU)数据需要一种机制，仅从一类标记数据中学习两类分类器。此外，由于来自这些领域的数据是高度敏感和隐私的，因此保护训练样本的隐私是必不可少的。本文通过设计一种针对正数据和未标记数据的差分私有算法来解决私有PU学习的挑战。当类先验概率已知时，我们首先提出了一个PU设置的学习框架，该框架具有收敛到最优分类器的理论保证。然后，我们提出了一种隐私保护机制，该机制在理论上和经验上都证明了隐私和效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Differential Privacy for Positive and Unlabeled Learning With Known Class Priors

Despite the increasing attention to big data, there are several domains where labeled data is scarce or too costly to obtain. For example, for data from information retrieval, gene analysis, and social network analysis, only training samples from the positive class are annotated while the remaining unlabeled training samples consist of both unlabeled positive and unlabeled negative samples. The specific positive and unlabeled (PU) data from those domains necessitates a mechanism to learn a two-class classifier from only one-class labeled data. Moreover, because data from those domains is highly sensitive and private, preserving training samples privacy is essential. This paper addresses the challenge of private PU learning by designing a differentially private algorithm for positive and unlabeled data. We first propose a learning framework for the PU setting when the class prior probability is known, with a theoretical guarantee of convergence to the optimal classifier. We then propose a privacy-preserving mechanism for the designed framework where the privacy and utility are both theoretically and empirically proved.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE Statistical Signal Processing Workshop (SSP)

自引率

0.00%

发文量