具有已知类先验的正和无标记学习的差分隐私

Anh T. Pham, R. Raich
{"title":"具有已知类先验的正和无标记学习的差分隐私","authors":"Anh T. Pham, R. Raich","doi":"10.1109/SSP.2018.8450839","DOIUrl":null,"url":null,"abstract":"Despite the increasing attention to big data, there are several domains where labeled data is scarce or too costly to obtain. For example, for data from information retrieval, gene analysis, and social network analysis, only training samples from the positive class are annotated while the remaining unlabeled training samples consist of both unlabeled positive and unlabeled negative samples. The specific positive and unlabeled (PU) data from those domains necessitates a mechanism to learn a two-class classifier from only one-class labeled data. Moreover, because data from those domains is highly sensitive and private, preserving training samples privacy is essential. This paper addresses the challenge of private PU learning by designing a differentially private algorithm for positive and unlabeled data. We first propose a learning framework for the PU setting when the class prior probability is known, with a theoretical guarantee of convergence to the optimal classifier. We then propose a privacy-preserving mechanism for the designed framework where the privacy and utility are both theoretically and empirically proved.","PeriodicalId":330528,"journal":{"name":"2018 IEEE Statistical Signal Processing Workshop (SSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Differential Privacy for Positive and Unlabeled Learning With Known Class Priors\",\"authors\":\"Anh T. Pham, R. Raich\",\"doi\":\"10.1109/SSP.2018.8450839\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the increasing attention to big data, there are several domains where labeled data is scarce or too costly to obtain. For example, for data from information retrieval, gene analysis, and social network analysis, only training samples from the positive class are annotated while the remaining unlabeled training samples consist of both unlabeled positive and unlabeled negative samples. The specific positive and unlabeled (PU) data from those domains necessitates a mechanism to learn a two-class classifier from only one-class labeled data. Moreover, because data from those domains is highly sensitive and private, preserving training samples privacy is essential. This paper addresses the challenge of private PU learning by designing a differentially private algorithm for positive and unlabeled data. We first propose a learning framework for the PU setting when the class prior probability is known, with a theoretical guarantee of convergence to the optimal classifier. We then propose a privacy-preserving mechanism for the designed framework where the privacy and utility are both theoretically and empirically proved.\",\"PeriodicalId\":330528,\"journal\":{\"name\":\"2018 IEEE Statistical Signal Processing Workshop (SSP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Statistical Signal Processing Workshop (SSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSP.2018.8450839\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Statistical Signal Processing Workshop (SSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSP.2018.8450839","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

尽管对大数据的关注越来越多,但仍有几个领域标记数据稀缺或获取成本过高。例如,对于来自信息检索、基因分析和社会网络分析的数据,只有来自正类的训练样本被注释,而其余未标记的训练样本包括未标记的正样本和未标记的负样本。来自这些领域的特定正和未标记(PU)数据需要一种机制,仅从一类标记数据中学习两类分类器。此外,由于来自这些领域的数据是高度敏感和隐私的,因此保护训练样本的隐私是必不可少的。本文通过设计一种针对正数据和未标记数据的差分私有算法来解决私有PU学习的挑战。当类先验概率已知时,我们首先提出了一个PU设置的学习框架,该框架具有收敛到最优分类器的理论保证。然后,我们提出了一种隐私保护机制,该机制在理论上和经验上都证明了隐私和效用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Differential Privacy for Positive and Unlabeled Learning With Known Class Priors
Despite the increasing attention to big data, there are several domains where labeled data is scarce or too costly to obtain. For example, for data from information retrieval, gene analysis, and social network analysis, only training samples from the positive class are annotated while the remaining unlabeled training samples consist of both unlabeled positive and unlabeled negative samples. The specific positive and unlabeled (PU) data from those domains necessitates a mechanism to learn a two-class classifier from only one-class labeled data. Moreover, because data from those domains is highly sensitive and private, preserving training samples privacy is essential. This paper addresses the challenge of private PU learning by designing a differentially private algorithm for positive and unlabeled data. We first propose a learning framework for the PU setting when the class prior probability is known, with a theoretical guarantee of convergence to the optimal classifier. We then propose a privacy-preserving mechanism for the designed framework where the privacy and utility are both theoretically and empirically proved.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信