{"title":"Differentially Private Decision Tree Based on Pearson’s Correlation Coefficient","authors":"Qi Yan, Jinyan Wang, Songfeng Liu, De Li","doi":"10.1109/ICIST52614.2021.9440604","DOIUrl":null,"url":null,"abstract":"As one of popular machine learning algorithms, decision trees can be applied to both classification and regression tasks. Many decision tree models are trained and used by some companies and researchers on data obtained from users for business decision. However, the use of decision trees may pose a potential risk to privacy when training data contains sensitive personal information. In this paper, we proposed an effective differentially private decision tree algorithm based on Pearson’s correlation coefficient, called Diff-PCCDT. The exponential mechanism is applied to select the best splitting attribute with the absolute value of Pearson’s correlation coefficient as quality function at every intermediate node. The laplace noise is added into the true class count of every leaf node to ensure that the decision tree does not leak privacy during classification. Furthermore, a parallel algorithm of Diff-PCCDT over Map-Reduce, called DiffMR-PCCDT, is proposed for big data scenario. The experimental results demonstrate that the Diff-PCCDT algorithm can effectively train differentially private decision tree and its parallel implementation DiffMR-PCCDT can deal with the big data classification problems.","PeriodicalId":371599,"journal":{"name":"2021 11th International Conference on Information Science and Technology (ICIST)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 11th International Conference on Information Science and Technology (ICIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIST52614.2021.9440604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As one of popular machine learning algorithms, decision trees can be applied to both classification and regression tasks. Many decision tree models are trained and used by some companies and researchers on data obtained from users for business decision. However, the use of decision trees may pose a potential risk to privacy when training data contains sensitive personal information. In this paper, we proposed an effective differentially private decision tree algorithm based on Pearson’s correlation coefficient, called Diff-PCCDT. The exponential mechanism is applied to select the best splitting attribute with the absolute value of Pearson’s correlation coefficient as quality function at every intermediate node. The laplace noise is added into the true class count of every leaf node to ensure that the decision tree does not leak privacy during classification. Furthermore, a parallel algorithm of Diff-PCCDT over Map-Reduce, called DiffMR-PCCDT, is proposed for big data scenario. The experimental results demonstrate that the Diff-PCCDT algorithm can effectively train differentially private decision tree and its parallel implementation DiffMR-PCCDT can deal with the big data classification problems.