Differentially Private Decision Tree Based on Pearson’s Correlation Coefficient

Qi Yan, Jinyan Wang, Songfeng Liu, De Li
{"title":"Differentially Private Decision Tree Based on Pearson’s Correlation Coefficient","authors":"Qi Yan, Jinyan Wang, Songfeng Liu, De Li","doi":"10.1109/ICIST52614.2021.9440604","DOIUrl":null,"url":null,"abstract":"As one of popular machine learning algorithms, decision trees can be applied to both classification and regression tasks. Many decision tree models are trained and used by some companies and researchers on data obtained from users for business decision. However, the use of decision trees may pose a potential risk to privacy when training data contains sensitive personal information. In this paper, we proposed an effective differentially private decision tree algorithm based on Pearson’s correlation coefficient, called Diff-PCCDT. The exponential mechanism is applied to select the best splitting attribute with the absolute value of Pearson’s correlation coefficient as quality function at every intermediate node. The laplace noise is added into the true class count of every leaf node to ensure that the decision tree does not leak privacy during classification. Furthermore, a parallel algorithm of Diff-PCCDT over Map-Reduce, called DiffMR-PCCDT, is proposed for big data scenario. The experimental results demonstrate that the Diff-PCCDT algorithm can effectively train differentially private decision tree and its parallel implementation DiffMR-PCCDT can deal with the big data classification problems.","PeriodicalId":371599,"journal":{"name":"2021 11th International Conference on Information Science and Technology (ICIST)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 11th International Conference on Information Science and Technology (ICIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIST52614.2021.9440604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As one of popular machine learning algorithms, decision trees can be applied to both classification and regression tasks. Many decision tree models are trained and used by some companies and researchers on data obtained from users for business decision. However, the use of decision trees may pose a potential risk to privacy when training data contains sensitive personal information. In this paper, we proposed an effective differentially private decision tree algorithm based on Pearson’s correlation coefficient, called Diff-PCCDT. The exponential mechanism is applied to select the best splitting attribute with the absolute value of Pearson’s correlation coefficient as quality function at every intermediate node. The laplace noise is added into the true class count of every leaf node to ensure that the decision tree does not leak privacy during classification. Furthermore, a parallel algorithm of Diff-PCCDT over Map-Reduce, called DiffMR-PCCDT, is proposed for big data scenario. The experimental results demonstrate that the Diff-PCCDT algorithm can effectively train differentially private decision tree and its parallel implementation DiffMR-PCCDT can deal with the big data classification problems.
基于Pearson相关系数的差分私有决策树
决策树作为一种流行的机器学习算法,既可以用于分类任务,也可以用于回归任务。许多决策树模型被一些公司和研究人员训练并使用从用户那里获得的数据进行业务决策。然而,当训练数据包含敏感的个人信息时,决策树的使用可能会对隐私构成潜在风险。本文提出了一种有效的基于Pearson相关系数的差分私有决策树算法,称为Diff-PCCDT。采用指数机制,在每个中间节点以Pearson相关系数的绝对值作为质量函数,选择最佳分割属性。在每个叶节点的真实类数中加入拉普拉斯噪声,保证决策树在分类过程中不泄露隐私。在此基础上,针对大数据场景,提出了一种基于Map-Reduce的DiffMR-PCCDT并行算法。实验结果表明,DiffMR-PCCDT算法可以有效地训练差分私有决策树,其并行实现DiffMR-PCCDT可以处理大数据分类问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信