A privacy protection technique for publishing data mining models and research data

Yu Fu, Zhiyuan Chen, Güneş Koru, A. Gangopadhyay
{"title":"A privacy protection technique for publishing data mining models and research data","authors":"Yu Fu, Zhiyuan Chen, Güneş Koru, A. Gangopadhyay","doi":"10.1145/1877725.1877732","DOIUrl":null,"url":null,"abstract":"Data mining techniques have been widely used in many research disciplines such as medicine, life sciences, and social sciences to extract useful knowledge (such as mining models) from research data. Research data often needs to be published along with the data mining model for verification or reanalysis. However, the privacy of the published data needs to be protected because otherwise the published data is subject to misuse such as linking attacks. Therefore, employing various privacy protection methods becomes necessary. However, these methods only consider privacy protection and do not guarantee that the same mining models can be built from sanitized data. Thus the published models cannot be verified using the sanitized data. This article proposes a technique that not only protects privacy, but also guarantees that the same model, in the form of decision trees or regression trees, can be built from the sanitized data. We have also experimentally shown that other mining techniques can be used to reanalyze the sanitized data. This technique can be used to promote sharing of research data.","PeriodicalId":178565,"journal":{"name":"ACM Trans. Manag. Inf. Syst.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Manag. Inf. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1877725.1877732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Data mining techniques have been widely used in many research disciplines such as medicine, life sciences, and social sciences to extract useful knowledge (such as mining models) from research data. Research data often needs to be published along with the data mining model for verification or reanalysis. However, the privacy of the published data needs to be protected because otherwise the published data is subject to misuse such as linking attacks. Therefore, employing various privacy protection methods becomes necessary. However, these methods only consider privacy protection and do not guarantee that the same mining models can be built from sanitized data. Thus the published models cannot be verified using the sanitized data. This article proposes a technique that not only protects privacy, but also guarantees that the same model, in the form of decision trees or regression trees, can be built from the sanitized data. We have also experimentally shown that other mining techniques can be used to reanalyze the sanitized data. This technique can be used to promote sharing of research data.
发布数据挖掘模型和研究数据的隐私保护技术
数据挖掘技术已广泛应用于医学、生命科学、社会科学等研究领域,从研究数据中提取有用的知识(如挖掘模型)。研究数据通常需要与数据挖掘模型一起发布,以便进行验证或重新分析。但是,需要保护发布数据的隐私,否则发布的数据会受到链接攻击等滥用。因此,采用各种隐私保护方法是必要的。但是,这些方法只考虑隐私保护,并不能保证可以从经过处理的数据构建相同的挖掘模型。因此,发布的模型无法使用经过处理的数据进行验证。本文提出了一种技术,该技术不仅可以保护隐私,还可以保证从经过处理的数据构建相同的模型(以决策树或回归树的形式)。我们还通过实验证明,可以使用其他挖掘技术来重新分析经过处理的数据。这种技术可以用来促进研究数据的共享。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信