An Indexed Bottom-up Approach for Publishing Anonymized Data

2012 Eighth International Conference on Computational Intelligence and Security Pub Date : 2012-11-17 DOI:10.1109/CIS.2012.148

A. Hoang, Minh Tran, A. Duong, I. Echizen

{"title":"An Indexed Bottom-up Approach for Publishing Anonymized Data","authors":"A. Hoang, Minh Tran, A. Duong, I. Echizen","doi":"10.1109/CIS.2012.148","DOIUrl":null,"url":null,"abstract":"Sharing information is one of the most important parts of social activities. However, sharing information can leak users' information. Removing all direct identifiers is not enough. Sweeney proposed an approach that applying k-anonymity to protect users' identities from linking attack. Sweeney`s algorithm finds out the optimal anonymized dataset through minimal distortion metric. Other authors proposed other optimal algorithms but their proposals are still impractical due to their high computational cost. Another approach is to release the minimal anonymized dataset by applying some heuristics. Wang and Fung proposed Bottom-up Generalization and Top-down Specialization (TDS) to publish a minimal anonymized dataset with information loss metric, whose performance is more efficient. However, these algorithms still have some limitations. In this paper, we propose an algorithm to publish anonymized datasets through bottom-up generalization approach and information loss data metric. Our algorithm can save time by storing statistical information for later usage. The experimental results is performanced on Adult dataset, which is used in all former algorithms. Experimental results show that our algorithm can process 949,662 records dataset in 42.219s. Classification error on anonymized data, which is created by our algorithm, is lower than Wang's algorithm 3.8%.","PeriodicalId":294394,"journal":{"name":"2012 Eighth International Conference on Computational Intelligence and Security","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Eighth International Conference on Computational Intelligence and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIS.2012.148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Sharing information is one of the most important parts of social activities. However, sharing information can leak users' information. Removing all direct identifiers is not enough. Sweeney proposed an approach that applying k-anonymity to protect users' identities from linking attack. Sweeney`s algorithm finds out the optimal anonymized dataset through minimal distortion metric. Other authors proposed other optimal algorithms but their proposals are still impractical due to their high computational cost. Another approach is to release the minimal anonymized dataset by applying some heuristics. Wang and Fung proposed Bottom-up Generalization and Top-down Specialization (TDS) to publish a minimal anonymized dataset with information loss metric, whose performance is more efficient. However, these algorithms still have some limitations. In this paper, we propose an algorithm to publish anonymized datasets through bottom-up generalization approach and information loss data metric. Our algorithm can save time by storing statistical information for later usage. The experimental results is performanced on Adult dataset, which is used in all former algorithms. Experimental results show that our algorithm can process 949,662 records dataset in 42.219s. Classification error on anonymized data, which is created by our algorithm, is lower than Wang's algorithm 3.8%.

查看原文本刊更多论文

一种索引自底向上的匿名数据发布方法

分享信息是社会活动中最重要的部分之一。然而，共享信息可能会泄露用户的信息。删除所有直接标识符是不够的。Sweeney提出了一种利用k-匿名来保护用户身份免受链接攻击的方法。Sweeney算法通过最小失真度量找出最优的匿名数据集。其他作者提出了其他最优算法，但由于计算成本高，这些算法仍然不切实际。另一种方法是通过应用一些启发式方法来释放最小的匿名数据集。Wang和Fung提出了自底向上泛化和自顶向下专门化(TDS)来发布具有信息丢失度量的最小匿名数据集，其性能更高效。然而，这些算法仍然有一些局限性。本文提出了一种基于自底向上泛化和信息丢失数据度量的匿名数据集发布算法。我们的算法可以通过存储统计信息以供以后使用来节省时间。实验结果是在成人数据集上进行的，成人数据集是所有算法使用的数据集。实验结果表明，该算法可在42.219s内处理949,662条记录数据集。我们的算法在匿名数据上产生的分类错误率比Wang的算法低3.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 Eighth International Conference on Computational Intelligence and Security

自引率

0.00%

发文量