基于决策树的数据挖掘中隐私保护的准标识符摄动技术

2009 Third International Conference on Research Challenges in Information Science Pub Date : 2009-04-22 DOI:10.1109/RCIS.2009.5089282

Bi-Ru Dai, YangChih Lin

{"title":"基于决策树的数据挖掘中隐私保护的准标识符摄动技术","authors":"Bi-Ru Dai, YangChih Lin","doi":"10.1109/RCIS.2009.5089282","DOIUrl":null,"url":null,"abstract":"Classification is an important issue in data mining, and decision tree is one of the most popular techniques for classification analysis. Some data sources contain private personal information that people are unwilling to reveal. The disclosure of person-specific data is possible to endanger thousands of people, and therefore the dataset should be protected before it is released for mining. However, techniques to hide private information usually modify the original dataset without considering influences on the prediction accuracy of a classification model. In this paper, we propose an algorithm to protect personal privacy for classification model based on decision tree. Our goal is to hide all person-specific information with minimized data perturbation. Furthermore, the prediction capability of the decision tree classifier can be maintained. As demonstrated in the experiments, the proposed algorithm can successfully hide private information with fewer disturbances of the classifier.","PeriodicalId":180106,"journal":{"name":"2009 Third International Conference on Research Challenges in Information Science","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A decision tree based quasi-identifier perturbation technique for preserving privacy in data mining\",\"authors\":\"Bi-Ru Dai, YangChih Lin\",\"doi\":\"10.1109/RCIS.2009.5089282\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification is an important issue in data mining, and decision tree is one of the most popular techniques for classification analysis. Some data sources contain private personal information that people are unwilling to reveal. The disclosure of person-specific data is possible to endanger thousands of people, and therefore the dataset should be protected before it is released for mining. However, techniques to hide private information usually modify the original dataset without considering influences on the prediction accuracy of a classification model. In this paper, we propose an algorithm to protect personal privacy for classification model based on decision tree. Our goal is to hide all person-specific information with minimized data perturbation. Furthermore, the prediction capability of the decision tree classifier can be maintained. As demonstrated in the experiments, the proposed algorithm can successfully hide private information with fewer disturbances of the classifier.\",\"PeriodicalId\":180106,\"journal\":{\"name\":\"2009 Third International Conference on Research Challenges in Information Science\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Third International Conference on Research Challenges in Information Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RCIS.2009.5089282\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Third International Conference on Research Challenges in Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCIS.2009.5089282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

分类是数据挖掘中的一个重要问题，决策树是分类分析中最常用的技术之一。一些数据源包含人们不愿透露的私人信息。个人数据的泄露可能会危及成千上万的人，因此数据集在发布挖掘之前应该受到保护。然而，隐藏私有信息的技术通常会修改原始数据集，而不考虑对分类模型预测精度的影响。本文提出了一种基于决策树的分类模型个人隐私保护算法。我们的目标是在最小化数据扰动的情况下隐藏所有个人特定信息。此外，还可以保持决策树分类器的预测能力。实验结果表明，该算法可以成功地隐藏私有信息，并且对分类器的干扰较小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A decision tree based quasi-identifier perturbation technique for preserving privacy in data mining

Classification is an important issue in data mining, and decision tree is one of the most popular techniques for classification analysis. Some data sources contain private personal information that people are unwilling to reveal. The disclosure of person-specific data is possible to endanger thousands of people, and therefore the dataset should be protected before it is released for mining. However, techniques to hide private information usually modify the original dataset without considering influences on the prediction accuracy of a classification model. In this paper, we propose an algorithm to protect personal privacy for classification model based on decision tree. Our goal is to hide all person-specific information with minimized data perturbation. Furthermore, the prediction capability of the decision tree classifier can be maintained. As demonstrated in the experiments, the proposed algorithm can successfully hide private information with fewer disturbances of the classifier.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 Third International Conference on Research Challenges in Information Science

自引率

0.00%

发文量